ANID: How Far Are We? Evaluating the Discrepancies Between AI-synthesized Images and Natural Images through Multimodal Guidance

要約

急速に進化する人工知能生成コンテンツ (AIGC) の分野における重要な課題の 1 つは、AI 合成画像と自然画像を区別することです。
視覚的に説得力のある画像を生成する高度な AI 生成モデルの優れた機能にもかかわらず、これらの画像を自然の画像と比較すると、大きな差異が残ります。
これらの不一致を体系的に調査して定量化するために、次の重要な質問に対処することを目的とした AI 自然画像不一致評価ベンチマークを導入します: \textit{AI 生成画像 (AIGI) は真に現実的な画像からどの程度離れていますか?}
スケールマルチモーダルデータセット、Distinguishing Natural and AI-generated Images (DNAI) データセット。これには、両方のユニモーダルを使用して 8 つの代表的なモデルによって生成された 440,000 を超える AIGI サンプルが含まれます。
Text-to-Image (T2I)、Image-to-Image (I2I)、Text \textit{vs.} Image-to-Image (TI2I) などのマルチモーダルプロンプトもあります。
当社のきめ細かい評価フレームワークは、素朴な視覚的特徴の品質、マルチモーダル生成におけるセマンティックな整合性、美的魅力、下流タスクの適用性、および調整された人間による検証という 5 つの主要な側面にわたる DNAI データセットの包括的な評価を提供します。
広範な評価結果は、これらの側面にわたる重大な不一致を浮き彫りにし、AI によって生成された画質の全体的な理解を達成するために、定量的指標と人間の判断を一致させる必要性を強調しています。
コードは \href{https://github.com/ryliu68/ANID}{https://github.com/ryliu68/ANID} で入手できます。

要約(オリジナル)

In the rapidly evolving field of Artificial Intelligence Generated Content (AIGC), one of the key challenges is distinguishing AI-synthesized images from natural images. Despite the remarkable capabilities of advanced AI generative models in producing visually compelling images, significant discrepancies remain when these images are compared to natural ones. To systematically investigate and quantify these discrepancies, we introduce an AI-Natural Image Discrepancy Evaluation benchmark aimed at addressing the critical question: \textit{how far are AI-generated images (AIGIs) from truly realistic images?} We have constructed a large-scale multimodal dataset, the Distinguishing Natural and AI-generated Images (DNAI) dataset, which includes over 440,000 AIGI samples generated by 8 representative models using both unimodal and multimodal prompts, such as Text-to-Image (T2I), Image-to-Image (I2I), and Text \textit{vs.} Image-to-Image (TI2I). Our fine-grained assessment framework provides a comprehensive evaluation of the DNAI dataset across five key dimensions: naive visual feature quality, semantic alignment in multimodal generation, aesthetic appeal, downstream task applicability, and coordinated human validation. Extensive evaluation results highlight significant discrepancies across these dimensions, underscoring the necessity of aligning quantitative metrics with human judgment to achieve a holistic understanding of AI-generated image quality. Code is available at \href{https://github.com/ryliu68/ANID}{https://github.com/ryliu68/ANID}.

arxiv情報

著者	Renyang Liu,Ziyu Lyu,Wei Zhou,See-Kiong Ng
発行日	2024-12-23 15:08:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ANID: How Far Are We? Evaluating the Discrepancies Between AI-synthesized Images and Natural Images through Multimodal Guidance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー