Cultural and Linguistic Diversity Improves Visual Representations

要約

コンピュータービジョンでは、知覚を客観的なものとして扱うことが多く、この仮定はデータセットの収集方法やモデルのトレーニング方法に反映されます。
たとえば、異なる言語での画像の説明は、通常、同じ意味論的な内容の翻訳であると想定されます。
しかし、異文化心理学と言語学の研究では、個人は文化的背景や話す言語に応じて視覚認識が異なることがわかっています。
この論文では、データセットとモデルで生成されたキャプションの両方において、言語間の意味論的な内容に大きな違いがあることを実証します。
データが単一言語ではなく多言語である場合、シーングラフ、埋め込み、言語の複雑さによって測定されるように、キャプションは平均してより高いセマンティックカバレッジを持ちます。
たとえば、多言語キャプションには、単一言語キャプションのセットよりも平均して 21.8% 多くのオブジェクト、24.5% より多くの関係、および 27.1% より多くの属性があります。
さらに、さまざまな言語のコンテンツでトレーニングされたモデルは、それらの言語のテストデータに対して最高のパフォーマンスを発揮しますが、多言語コンテンツでトレーニングされたモデルは、すべての評価データ構成にわたって一貫して良好なパフォーマンスを発揮します。
私たちの研究は、多様な知覚モードがどのように画像理解を向上させることができるかについての示唆を提供します。

要約(オリジナル)

Computer vision often treats perception as objective, and this assumption gets reflected in the way that datasets are collected and models are trained. For instance, image descriptions in different languages are typically assumed to be translations of the same semantic content. However, work in cross-cultural psychology and linguistics has shown that individuals differ in their visual perception depending on their cultural background and the language they speak. In this paper, we demonstrate significant differences in semantic content across languages in both dataset and model-produced captions. When data is multilingual as opposed to monolingual, captions have higher semantic coverage on average, as measured by scene graph, embedding, and linguistic complexity. For example, multilingual captions have on average 21.8% more objects, 24.5% more relations, and 27.1% more attributes than a set of monolingual captions. Moreover, models trained on content from different languages perform best against test data from those languages, while those trained on multilingual content perform consistently well across all evaluation data compositions. Our research provides implications for how diverse modes of perception can improve image understanding.

arxiv情報

著者	Andre Ye,Sebastin Santy,Jena D. Hwang,Amy X. Zhang,Ranjay Krishna
発行日	2023-11-24 05:55:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Cultural and Linguistic Diversity Improves Visual Representations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー