Inspecting the Geographical Representativeness of Images from Text-to-Image Models

要約

生成モデルの最近の進歩により、ほとんどのテキスト入力に対して現実的な画像と関連性のある画像の両方を生成するモデルが誕生しました。
これらのモデルは毎日何百万もの画像を生成するために使用されており、ジェネレーティブアート、デジタルマーケティング、データ拡張などの分野に劇的な影響を与える可能性を秘めています。
影響力が非常に大きいため、生成されたコンテンツが世界の特定の地域を過剰に表現するのではなく、世界中の人工物や環境を確実に反映することが重要です。
この論文では、27 か国の 540 人の参加者からなるクラウドソーシング研究を使用して、DALL.E 2 および安定拡散モデルを通じて生成された普通名詞 (例: 家) の地理的代表性を測定します。
国名のない意図的に過少指定された入力の場合、生成された画像は米国の環境を最も多く反映し、次にインドが続き、上位の世代が他のすべての国の環境を反映することはほとんどありません (平均スコアは 5 点中 3 未満)。
入力に国名を指定すると、DALL.E 2 の代表性が平均 1.44 ポイント、安定拡散の平均 0.75 ポイント向上しますが、多くの国の全体的なスコアは依然として低いため、将来のモデルが地理的により包括的になる必要性が浮き彫りになっています。
最後に、ユーザー調査を行わずに、生成された画像の地理的代表性を定量化する実現可能性を検討します。

要約(オリジナル)

Recent progress in generative models has resulted in models that produce both realistic as well as relevant images for most textual inputs. These models are being used to generate millions of images everyday, and hold the potential to drastically impact areas such as generative art, digital marketing and data augmentation. Given their outsized impact, it is important to ensure that the generated content reflects the artifacts and surroundings across the globe, rather than over-representing certain parts of the world. In this paper, we measure the geographical representativeness of common nouns (e.g., a house) generated through DALL.E 2 and Stable Diffusion models using a crowdsourced study comprising 540 participants across 27 countries. For deliberately underspecified inputs without country names, the generated images most reflect the surroundings of the United States followed by India, and the top generations rarely reflect surroundings from all other countries (average score less than 3 out of 5). Specifying the country names in the input increases the representativeness by 1.44 points on average for DALL.E 2 and 0.75 for Stable Diffusion, however, the overall scores for many countries still remain low, highlighting the need for future models to be more geographically inclusive. Lastly, we examine the feasibility of quantifying the geographical representativeness of generated images without conducting user studies.

arxiv情報

著者	Abhipsa Basu,R. Venkatesh Babu,Danish Pruthi
発行日	2023-05-18 16:08:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Inspecting the Geographical Representativeness of Images from Text-to-Image Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー