MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model

要約

大規模言語モデル (LLM) は、数学的推論、特にテキストベースの数学的問題において重要な能力を実証しています。
しかし、現在のマルチモーダル大規模言語モデル (MLLM)、特に数学に特化したモデルは、主に幾何学的問題の解決に焦点を当て、数学の他の分野で利用できる視覚情報の多様性を無視する傾向があります。
さらに、これらの特殊な数学的 MLLM の幾何学的情報は、一般に多様性と複雑さが制限されているいくつかの公開データセットから得られます。
これらの制限に対処するために、私たちは MathVL という名前の微調整データセットを構築し、さまざまなパラメータースケールのバックボーンを備えた MathVL 上で教師付き微調整 (SFT) を実行することによって、MathGLM-Vision と呼ばれる一連の特殊な数学的 MLLM を開発することを目指しています。
MathGLM-Vision の有効性を広範に評価するために、私たちはいくつかの公開ベンチマークと 2,000 の問題で構成される厳選された MathVL-テストで実験を実施しました。
実験結果は、MathGLM-Vision がバックボーンモデルやオープンソースの数学的 MLLM を含むいくつかの既存のモデルと比較して大幅な改善を達成していることを示しています。
これらの発見は、MLLM の数学的推論能力を強化する上で多様性データセットが重要であることを示しています。

要約(オリジナル)

Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathematics, tend to focus predominantly on solving geometric problems but ignore the diversity of visual information available in other areas of mathematics. Moreover, the geometric information for these specialized mathematical MLLMs is derived from several public datasets, which are typically limited in diversity and complexity. To address these limitations, we aim to construct a fine-tuning dataset named MathVL, and develop a series of specialized mathematical MLLMs termed MathGLM-Vision by conducting Supervised Fine-Tuning (SFT) on MathVL with various parameter-scale backbones. To extensively evaluate the effectiveness of MathGLM-Vision, we conduct experiments on several public benchmarks and our curated MathVL-test consisting of 2,000 problems. Experimental results demonstrate that MathGLM-Vision achieves significant improvements compared with some existing models, including backbone models and open-source mathematical MLLMs. These findings indicate the importance of diversity dataset in enhancing the mathematical reasoning abilities of MLLMs.

arxiv情報

著者	Zhen Yang,Jinhao Chen,Zhengxiao Du,Wenmeng Yu,Weihan Wang,Wenyi Hong,Zhihuan Jiang,Bin Xu,Jie Tang
発行日	2024-12-02 14:59:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー