MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

要約

MMMU を紹介します。MMMU は、大学レベルの主題知識と慎重な推論を必要とする大規模な複数分野のタスクでマルチモーダルモデルを評価するように設計された新しいベンチマークです。
MMMU には、芸術とデザイン、ビジネス、科学、健康と医学、人文科学と社会科学、技術と工学の 6 つの主要分野をカバーする、大学の試験、クイズ、教科書から注意深く収集された 11.5K のマルチモーダルな質問が含まれています。
これらの質問は 30 の主題と 183 のサブフィールドに及び、チャート、図、地図、表、楽譜、化学構造など 30 種類の非常に異質な画像で構成されています。
既存のベンチマークとは異なり、MMMU は、専門家が直面するタスクと同様のタスクを実行するための、ドメイン固有の知識による高度な認識と推論、挑戦的なモデルに焦点を当てています。
14 のオープンソース LMM、独自の GPT-4V(ision) および Gemini の評価は、MMMU によってもたらされる重大な課題を浮き彫りにしています。
先進的な GPT-4V と Gemini Ultra でさえ、それぞれ 56% と 59% の精度しか達成できず、改善の余地が大きいことを示しています。
私たちは、MMMU がコミュニティを刺激して、エキスパートの汎用人工知能に向けた次世代のマルチモーダル基盤モデルを構築すると信じています。

要約(オリジナル)

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of 14 open-source LMMs as well as the proprietary GPT-4V(ision) and Gemini highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence.

arxiv情報

著者	Xiang Yue,Yuansheng Ni,Kai Zhang,Tianyu Zheng,Ruoqi Liu,Ge Zhang,Samuel Stevens,Dongfu Jiang,Weiming Ren,Yuxuan Sun,Cong Wei,Botao Yu,Ruibin Yuan,Renliang Sun,Ming Yin,Boyuan Zheng,Zhenzhu Yang,Yibo Liu,Wenhao Huang,Huan Sun,Yu Su,Wenhu Chen
発行日	2024-06-13 15:02:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー