Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

要約

大規模な言語モデル（LLM）がますます印象的な機能を備えているため、視覚入力でLLMを増強するために、多くの大規模なビジョン言語モデル（LVLM）が提案されています。
このようなモデルは、入力画像とテキストプロンプトの両方でテキストを生成し、視覚的な質問応答やマルチモーダルチャットなどのさまざまなユースケースを有効にします。
以前の研究では、LLMSによって生成されたテキストに含まれる社会的バイアスを調査しましたが、このトピックはLVLMSで比較的未開拓です。
LVLMSの社会的バイアスを調べることは、テキストと視覚的モダリティに含まれる情報によって誘発されるバイアスの交絡貢献のために特に困難です。
この挑戦的な問題に対処するために、入力画像に対する反事実的な変更の下で異なるLVLMによって生成されたテキストの大規模な研究を実施し、一般的なモデルから5700万以上の回答を生成します。
私たちの多次元バイアス評価フレームワークは、画像に描かれた知覚された人種、性別、物理的特性などの社会的属性が、有毒な含有量の生成、能力関連の単語、有害なステレオタイプ、個人の数値評価に大きな影響を与える可能性があることを明らかにしています。

要約(オリジナル)

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images, producing over 57 million responses from popular models. Our multi-dimensional bias evaluation framework reveals that social attributes such as perceived race, gender, and physical characteristics depicted in images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of individuals.

arxiv情報

著者	Phillip Howard,Kathleen C. Fraser,Anahita Bhiwandiwalla,Svetlana Kiritchenko
発行日	2025-04-30 17:07:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー