MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps

要約

マルチモーダル思考連鎖 (MCoT) は、さまざまな複雑な推論タスクにわたってマルチモーダル大規模言語モデル (MLLM) のパフォーマンスを向上させるための一般的なプロンプト戦略です。
その人気にもかかわらず、MCoT の推論ステップの品質を評価するための自動化された方法が著しく不足しています。
このギャップに対処するために、記述と各推論ステップの両方の品質を評価することで推論チェーンの正しさを評価するように設計されたフレームワークであるマルチモーダル思考連鎖評価 (MiCEval) を提案します。
説明コンポーネントの評価は画像説明の精度に焦点を当てますが、推論ステップでは、前のステップに基づいて条件付きで生成される各ステップの品質を評価します。
MiCEval は、正確性、関連性、有益性に基づいて各ステップを評価する注釈を備えたきめの細かいデータセットに基づいて構築されています。
4 つの最先端の MLLM に関する広範な実験により、MiCEval を使用した段階的な評価は、コサイン類似度や微調整アプローチに基づく既存の方法と比較して、人間の判断とより密接に一致することが示されました。
MiCEval データセットとコードは https://github.com/alenai97/MiCEval にあります。

要約(オリジナル)

Multimodal Chain of Thought (MCoT) is a popular prompting strategy for improving the performance of multimodal large language models (MLLMs) across a range of complex reasoning tasks. Despite its popularity, there is a notable absence of automated methods for evaluating the quality of reasoning steps in MCoT. To address this gap, we propose Multimodal Chain-of-Thought Evaluation (MiCEval), a framework designed to assess the correctness of reasoning chains by evaluating the quality of both the description and each reasoning step. The evaluation of the description component focuses on the accuracy of the image descriptions, while the reasoning step evaluates the quality of each step as it is conditionally generated based on the preceding steps. MiCEval is built upon a fine-grained dataset with annotations that rate each step according to correctness, relevance, and informativeness. Extensive experiments on four state-of-the-art MLLMs show that step-wise evaluations using MiCEval align more closely with human judgments compared to existing methods based on cosine similarity or fine-tuning approaches. MiCEval datasets and code can be found in https://github.com/alenai97/MiCEval.

arxiv情報

著者	Xiongtao Zhou,Jie He,Lanyu Chen,jingyu li,Haojing Chen,Victor Gutierrez Basulto,Jeff Z. Pan,Hanjie Chen
発行日	2024-10-18 17:57:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー