Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

要約

ビジョン言語モデル（VLMS）の評価は、主に英語のベンチマークに依存しており、多言語と多文化の両方のカバレッジの両方に大きなギャップを残しています。
多言語のベンチマークはサイズと言語の両方で拡張されていますが、多くは英語のデータセットの翻訳に依存しており、文化的なニュアンスをキャプチャできません。
この作業では、視覚言語モデルの多言語評価のために、これまでで最も包括的な試験ベンチマークとして万華鏡を提案します。
KaleIdoscopeは、多様な言語と視覚入力を超えてVLMを評価するために設計された大規模で言語内のマルチモーダルベンチマークです。
万華鏡は18の言語と14の異なる被験者をカバーし、合計20,911の複数選択の質問に相当します。
世界中の多様な研究者グループとのオープンサイエンスコラボレーションを通じて構築された万華鏡は、言語的および文化的信頼性を保証します。
トップパフォーマンスの多言語ビジョン言語モデルを評価し、低リソース言語や複雑なマルチモーダルシナリオではパフォーマンスが低いことがわかります。
私たちの結果は、文化的に包括的なマルチモーダル評価フレームワークの進歩の必要性を強調しています。

要約(オリジナル)

The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam benchmark to date for the multilingual evaluation of vision-language models. Kaleidoscope is a large-scale, in-language multimodal benchmark designed to evaluate VLMs across diverse languages and visual inputs. Kaleidoscope covers 18 languages and 14 different subjects, amounting to a total of 20,911 multiple-choice questions. Built through an open science collaboration with a diverse group of researchers worldwide, Kaleidoscope ensures linguistic and cultural authenticity. We evaluate top-performing multilingual vision-language models and find that they perform poorly on low-resource languages and in complex multimodal scenarios. Our results highlight the need for progress on culturally inclusive multimodal evaluation frameworks.

arxiv情報

著者	Israfel Salazar,Manuel Fernández Burda,Shayekh Bin Islam,Arshia Soltani Moakhar,Shivalika Singh,Fabian Farestam,Angelika Romanou,Danylo Boiko,Dipika Khullar,Mike Zhang,Dominik Krzemiński,Jekaterina Novikova,Luísa Shimabucoro,Joseph Marvin Imperial,Rishabh Maheshwary,Sharad Duwal,Alfonso Amayuelas,Swati Rajwal,Jebish Purbey,Ahmed Ruby,Nicholas Popovič,Marek Suppa,Azmine Toushik Wasi,Ram Mohan Rao Kadiyala,Olga Tsymboi,Maksim Kostritsya,Bardia Soltani Moakhar,Gabriel da Costa Merlin,Otávio Ferracioli Coletti,Maral Jabbari Shiviari,MohammadAmin farahani fard,Silvia Fernandez,María Grandury,Dmitry Abulkhanov,Drishti Sharma,Andre Guarnier De Mitri,Leticia Bossatto Marchezi,Johan Obando-Ceron,Nazar Kohut,Beyza Ermis,Desmond Elliott,Enzo Ferrante,Sara Hooker,Marzieh Fadaee
発行日	2025-04-09 17:43:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー