ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

要約

大規模言語モデル (LLM) は目覚ましい成功を収め、化学を含むさまざまな科学分野に適用されています。
しかし、化学タスクの多くは視覚情報の処理を必要とし、既存の化学 LLM ではうまく処理できません。
これにより、化学ドメインでマルチモーダルな情報を統合できるモデルのニーズが高まっています。
この論文では、化学アプリケーション向けに特別に設計されたオープンソースの化学マルチモーダル大規模言語モデル \textbf{ChemVLM} を紹介します。
ChemVLM は、慎重に精選されたバイリンガルのマルチモーダルデータセットでトレーニングされており、分子構造、反応、化学試験問題など、テキストと視覚の両方の化学情報を理解する能力を強化します。
私たちは、化学光学的文字認識 (OCR)、マルチモーダル化学推論 (MMCR)、およびマルチモーダル分子理解タスクに合わせた、包括的な評価のための 3 つのデータセットを開発しました。
私たちは、さまざまなタスクに関して、さまざまなオープンソースおよび独自のマルチモーダル大規模言語モデルに対して ChemVLM をベンチマークします。
実験結果は、ChemVLM が評価されたすべてのタスクにわたって競争力のあるパフォーマンスを達成していることを示しています。
私たちのモデルは https://huggingface.co/AI4Chem/ChemVLM-26B で見つけることができます。

要約(オリジナル)

Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper, we introduce \textbf{ChemVLM}, an open-source chemical multimodal large language model specifically designed for chemical applications. ChemVLM is trained on a carefully curated bilingual multimodal dataset that enhances its ability to understand both textual and visual chemical information, including molecular structures, reactions, and chemistry examination questions. We develop three datasets for comprehensive evaluation, tailored to Chemical Optical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), and Multimodal Molecule Understanding tasks. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks. Experimental results demonstrate that ChemVLM achieves competitive performance across all evaluated tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.

arxiv情報

著者	Junxian Li,Di Zhang,Xunzhi Wang,Zeying Hao,Jingdi Lei,Qian Tan,Cai Zhou,Wei Liu,Yaotian Yang,Xinrui Xiong,Weiyun Wang,Zhe Chen,Wenhai Wang,Wei Li,Shufei Zhang,Mao Su,Wanli Ouyang,Yuqiang Li,Dongzhan Zhou
発行日	2024-08-16 16:46:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー