GIMMICK — Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking

要約

大規模なビジョン言語モデル（LVLM）は、その独特のパフォーマンスと幅広い適用性により、最近注目を集めています。
以前は、非西洋のコンテキストを含む使用シナリオでの有効性が不足していることが示されていますが、既存の研究は範囲が限られており、狭い範囲の文化をカバーし、少数の文化的側面のみに焦点を当てたり、限られた選択を評価したりします。
単一のタスクのみのモデルのみ。
グローバルに包括的なLVLM研究に向けて、6つのグローバルマクロ地域を代表する144か国で幅広い文化的知識を評価するために設計された広範なマルチモーダルベンチマークであるGimmickを紹介します。
Gimmickは、すべてのサイズの5つの独自および26のオープンウェイトモデルを含む、20のLVLMSと11 LLMを評価した728のユニークな文化イベントまたはファセットにまたがる3つの新しいデータセットの上に構築された6つのタスクで構成されています。
（1）地域の文化的バイアス、（2）モデルサイズの影響、（3）入力モダリティ、および（4）外部キューを体系的に調べます。
私たちの分析は、モデルとタスク全体の西洋文化に対する強いバイアスを明らかにし、モデルのサイズとパフォーマンスの間の強い相関関係、およびマルチモーダル入力と外部の地理的キューの有効性を強調しています。
さらに、モデルは、無形の側面（食物対儀式など）よりも具体的な知識を持ち、広範な文化的起源を認識しているが、より微妙な理解に苦しんでいることに優れていることがわかります。

要約(オリジナル)

Large Vision-Language Models (LVLMs) have recently gained attention due to their distinctive performance and broad applicability. While it has been previously shown that their efficacy in usage scenarios involving non-Western contexts falls short, existing studies are limited in scope, covering just a narrow range of cultures, focusing exclusively on a small number of cultural aspects, or evaluating a limited selection of models on a single task only. Towards globally inclusive LVLM research, we introduce GIMMICK, an extensive multimodal benchmark designed to assess a broad spectrum of cultural knowledge across 144 countries representing six global macro-regions. GIMMICK comprises six tasks built upon three new datasets that span 728 unique cultural events or facets on which we evaluated 20 LVLMs and 11 LLMs, including five proprietary and 26 open-weight models of all sizes. We systematically examine (1) regional cultural biases, (2) the influence of model size, (3) input modalities, and (4) external cues. Our analyses reveal strong biases toward Western cultures across models and tasks and highlight strong correlations between model size and performance, as well as the effectiveness of multimodal input and external geographic cues. We further find that models have more knowledge of tangible than intangible aspects (e.g., food vs. rituals) and that they excel in recognizing broad cultural origins but struggle with a more nuanced understanding.

arxiv情報

著者	Florian Schneider,Carolin Holtermann,Chris Biemann,Anne Lauscher
発行日	2025-02-19 14:27:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GIMMICK — Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー