CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature


ChEBI は化学分野でよく知られたオントロジーであり、化学実体とその特性を定義するための包括的なリソースを提供します。
これに対処するために、我々は、既存の注釈付きテキスト コーパスを Chebi からの知識で強化し、大規模言語モデル (LLM) を微調整して科学テキストにおける化学物質とその役割を認識する方法論を提案します。
存在論的知識と LLM の言語理解能力を組み合わせることで、科学文献における化学物質と役割の両方を特定する際に高い精度と再現率を実現します。
さらに、8,000 件の ChemRxiv 論文セットからそれらを抽出し、2 番目の LLM を適用して化学物質と役割 (CEAR) のナレッジ グラフ (KG) を作成します。これは ChEBI に補完的な情報を提供し、ChEBI の拡張に役立ちます。


Ontologies are formal representations of knowledge in specific domains that provide a structured framework for organizing and understanding complex information. Creating ontologies, however, is a complex and time-consuming endeavor. ChEBI is a well-known ontology in the field of chemistry, which provides a comprehensive resource for defining chemical entities and their properties. However, it covers only a small fraction of the rapidly growing knowledge in chemistry and does not provide references to the scientific literature. To address this, we propose a methodology that involves augmenting existing annotated text corpora with knowledge from Chebi and fine-tuning a large language model (LLM) to recognize chemical entities and their roles in scientific text. Our experiments demonstrate the effectiveness of our approach. By combining ontological knowledge and the language understanding capabilities of LLMs, we achieve high precision and recall rates in identifying both the chemical entities and roles in scientific literature. Furthermore, we extract them from a set of 8,000 ChemRxiv articles, and apply a second LLM to create a knowledge graph (KG) of chemical entities and roles (CEAR), which provides complementary information to ChEBI, and can help to extend it.


著者 Stefan Langer,Fabian Neuhaus,Andreas Nürnberger
発行日 2024-07-31 15:56:06+00:00
