DARWIN 1.5: Large Language Models as Materials Science Adapted Learners

要約

材料の発見と設計は、非常に複雑で多様な探索空間にわたって、望ましい特性を持つコンポーネントと構造を見つけることを目的としています。
高スループットのシミュレーションや機械学習 (ML) などの従来のソリューションは、多くの場合、複雑な記述子に依存しており、タスク間での汎用性や移行性が妨げられています。
さらに、これらの記述子は、現実世界では避けられない欠陥や純度の問題により実験データから逸脱する可能性があり、実際のアプリケーションでの有効性が低下する可能性があります。
これらの課題に対処するために、材料科学向けに調整されたオープンソースの大規模言語モデル (LLM) である Darwin 1.5 を提案します。
自然言語を入力として活用することで、Darwin はタスク固有の記述子の必要性を排除し、材料特性の予測と発見に対する柔軟で統一されたアプローチを可能にします。
当社では、質問応答 (QA) の微調整とマルチタスク学習 (MTL) を組み合わせた 2 段階のトレーニング戦略を採用し、さまざまなモダリティでドメイン固有の知識を注入し、タスク間の知識伝達を促進します。
戦略的アプローチを通じて、LLaMA-7B ベースモデルと比較して最大 60\% の向上という、LLM の予測精度の大幅な向上を達成しました。
これは、材料科学のさまざまなタスクにおいて従来の機械学習モデルをさらに上回るパフォーマンスを示し、材料の発見と設計のための、より多用途でスケーラブルな基礎モデルを提供する LLM の可能性を示しています。

要約(オリジナル)

Materials discovery and design aim to find components and structures with desirable properties over highly complex and diverse search spaces. Traditional solutions, such as high-throughput simulations and machine learning (ML), often rely on complex descriptors, which hinder generalizability and transferability across tasks. Moreover, these descriptors may deviate from experimental data due to inevitable defects and purity issues in the real world, which may reduce their effectiveness in practical applications. To address these challenges, we propose Darwin 1.5, an open-source large language model (LLM) tailored for materials science. By leveraging natural language as input, Darwin eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. We employ a two-stage training strategy combining question-answering (QA) fine-tuning with multi-task learning (MTL) to inject domain-specific knowledge in various modalities and facilitate cross-task knowledge transfer. Through our strategic approach, we achieved a significant enhancement in the prediction accuracy of LLMs, with a maximum improvement of 60\% compared to LLaMA-7B base models. It further outperforms traditional machine learning models on various tasks in material science, showcasing the potential of LLMs to provide a more versatile and scalable foundation model for materials discovery and design.

arxiv情報

著者	Tong Xie,Yuwei Wan,Yixuan Liu,Yuchen Zeng,Wenjie Zhang,Chunyu Kit,Dongzhan Zhou,Bram Hoex
発行日	2024-12-16 16:51:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DARWIN 1.5: Large Language Models as Materials Science Adapted Learners

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー