Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

要約

マルチモーダル検索の高等世代（MRAG）は、外部の知識を統合することにより、推論機能を強化します。
ただし、既存のベンチマークは、主に単純な画像テキストインタラクションに焦点を当てており、実際のアプリケーションで一般的なチャートなどの複雑な視覚形式を見落としています。
この作業では、この制限に対処するために、新しいタスクであるチャートベースのMRAGを紹介します。
高品質の評価サンプルを半自動的に生成するために、構造化されたキーポイント抽出、クロスモーダル検証、およびキーポイントベースの世代を通じて評価データを生成するフレームワークであるチャートベースのドキュメント質問質問生成（電荷）を提案します。
充電と専門家の検証を組み合わせることにより、チャートベースのMRAG評価の包括的なベンチマークであるチャートMRAGベンチを構築します。
私たちの評価は、現在のアプローチにおける3つの重要な制限を明らかにしています：（1）チャートベースのシナリオでの統一マルチモーダル埋め込み検索方法の闘争（2）地面の回復、最先端のMLLMは58.19％の正確性と73.87を達成します
％カバレッジスコア、および（3）MLLMSは、チャートベースのMRAG推論中に一貫したテキストオーバーモダリティバイアスを示しています。
充電とチャートMRAGベンチは、https：//github.com/nomothings/chary.gitでリリースされます。

要約(オリジナル)

Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically generate high-quality evaluation samples, we propose CHARt-based document question-answering GEneration (CHARGE), a framework that produces evaluation data through structured keypoint extraction, crossmodal verification, and keypoint-based generation. By combining CHARGE with expert validation, we construct Chart-MRAG Bench, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our evaluation reveals three critical limitations in current approaches: (1) unified multimodal embedding retrieval methods struggles in chart-based scenarios, (2) even with ground-truth retrieval, state-of-the-art MLLMs achieve only 58.19% Correctness and 73.87% Coverage scores, and (3) MLLMs demonstrate consistent text-over-visual modality bias during Chart-based MRAG reasoning. The CHARGE and Chart-MRAG Bench are released at https://github.com/Nomothings/CHARGE.git.

arxiv情報

著者	Yuming Yang,Jiang Zhong,Li Jin,Jingwang Huang,Jingpeng Gao,Qing Liu,Yang Bai,Jingyuan Zhang,Rui Jiang,Kaiwen Wei
発行日	2025-02-20 18:59:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー