Utilizing Large Language Models in an iterative paradigm with domain feedback for molecule optimization

要約

分子の最適化は、化学修飾を通じて特定の分子の望ましい特性を最適化する創薬における重要なタスクです。
大規模言語モデル (LLM) は、自然言語を使用して最適化を指示することでこのタスクを効率的にシミュレートできる可能性を秘めていますが、それらをそのまま利用するとパフォーマンスが制限されます。
この研究では、シンプルでありながら非常に効果的なドメインフィードバックプロバイダー、つまり $\text{Re}^3$DF を提案することで、反復パラダイムでの LLM の利用を促進します。
詳細には、$\text{Re}^3$DF は、変更された分子が化学的に無効な場合に、外部ツールキット RDKit を利用して分子幻覚を処理します。
それ以外の場合は、目的のプロパティが計算されて元のプロパティと比較され、目的物に向かう正しい方向と距離を含む信頼性の高いドメインフィードバックが確立され、その後に取得された例が表示され、LLM が修飾された分子を改良するようにガイドされます。
2 つのしきい値を使用して、単一プロパティと複数プロパティの両方の目的にわたって実験を実施しました。$\text{Re}^3$DF は大幅な改善を示しました。
特に、20 個の単一プロパティ目標の場合、$\text{Re}^3$DF は、緩い (\texttt{l}) しきい値と厳密な (\texttt{s}) しきい値の下で、それぞれ 16.95% と 20.76% ヒット率を向上させます。
32 個のマルチプロパティ目標の場合、$\text{Re}^3$DF はヒット率を 6.04% および 5.25% 向上させます。

要約(オリジナル)

Molecule optimization is a critical task in drug discovery to optimize desired properties of a given molecule through chemical modification. Despite Large Language Models (LLMs) holding the potential to efficiently simulate this task by using natural language to direct the optimization, straightforwardly utilizing them shows limited performance. In this work, we facilitate utilizing LLMs in an iterative paradigm by proposing a simple yet highly effective domain feedback provider, namely $\text{Re}^3$DF. In detail, $\text{Re}^3$DF harnesses an external toolkit, RDKit, to handle the molecule hallucination, if the modified molecule is chemically invalid. Otherwise, its desired properties are computed and compared to the original one, establishing reliable domain feedback with correct direction and distance towards the objective, followed by a retrieved example, to guide the LLM to refine the modified molecule. We conduct experiments across both single- and multi-property objectives with 2 thresholds, where $\text{Re}^3$DF shows significant improvements. Particularly, for 20 single-property objectives, $\text{Re}^3$DF enhances Hit ratio by 16.95% and 20.76% under loose (\texttt{l}) and strict (\texttt{s}) thresholds, respectively. For 32 multi-property objectives, $\text{Re}^3$DF enhances Hit ratio by 6.04% and 5.25%.

arxiv情報

著者	Khiem Le,Nitesh V. Chawla
発行日	2024-11-18 15:41:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Utilizing Large Language Models in an iterative paradigm with domain feedback for molecule optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー