MUBen: Benchmarking the Uncertainty of Molecular Representation Models

要約

大規模なラベルなしデータで事前トレーニングされた大規模な分子表現モデルは、分子特性の予測に大きな成功を収めています。
ただし、これらのモデルは微調整データを過剰適合する傾向があり、その結果、トレーニング分布から外れるテストデータに対する過信予測が発生する可能性があります。
この問題に対処するには、不確実性定量化 (UQ) 手法を使用して、モデルの予測の校正を改善できます。
多くの UQ アプローチが存在しますが、そのすべてがパフォーマンスの向上につながるわけではありません。
一部の研究では、分子の事前トレーニング済みモデルを改善するために UQ が含まれていますが、信頼性の高い分子の不確実性推定に適したバックボーンと UQ 手法を選択するプロセスはまだ研究されていません。
このギャップに対処するために、最先端のバックボーン分子表現モデルのさまざまな UQ メソッドを評価して、その機能を調査する MUBen を紹介します。
さまざまなカテゴリの UQ メソッドを使用して、さまざまな分子記述子を入力として使用してさまざまなバックボーンを微調整することで、アーキテクチャ上の決定とトレーニング戦略の影響を批判的に評価します。
私たちの研究は、バックボーンモデルとして UQ を選択するための洞察を提供し、材料科学や創薬などの分野における不確実性が重要なアプリケーションの研究を促進できます。

要約(オリジナル)

Large molecular representation models pre-trained on massive unlabeled data have shown great success in predicting molecular properties. However, these models may tend to overfit the fine-tuning data, resulting in over-confident predictions on test data that fall outside of the training distribution. To address this issue, uncertainty quantification (UQ) methods can be used to improve the models’ calibration of predictions. Although many UQ approaches exist, not all of them lead to improved performance. While some studies have included UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored. To address this gap, we present MUBen, which evaluates different UQ methods for state-of-the-art backbone molecular representation models to investigate their capabilities. By fine-tuning various backbones using different molecular descriptors as inputs with UQ methods from different categories, we critically assess the influence of architectural decisions and training strategies. Our study offers insights for selecting UQ for backbone models, which can facilitate research on uncertainty-critical applications in fields such as materials science and drug discovery.

arxiv情報

著者	Yinghao Li,Lingkai Kong,Yuanqi Du,Yue Yu,Yuchen Zhuang,Wenhao Mu,Chao Zhang
発行日	2023-10-02 16:44:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MUBen: Benchmarking the Uncertainty of Molecular Representation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー