Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

要約

テキストと画像の合成はめざましい進歩を遂げており、最近多くの注目を集めています。
ただし、インセプションスコアやフレシェインセプションディスタンスなど、この分野で一般的な評価指標にはいくつかの問題があります。
まず第一に、生成された画像の知覚品質を明示的に評価することができず、各テキストと画像のペアの意味論的な整合性があまり反映されません。
また、非効率的であり、評価結果を安定させるために数千枚の画像をサンプリングする必要があります。
この論文では、事前にトレーニングされた尤度ベースのテキストから画像への生成モデルを使用して、生成された画像の尤度を直接推定することにより、テキストから画像への生成パフォーマンスを評価することを提案します。つまり、尤度が高いほど知覚品質が高く、
テキストと画像の位置合わせが改善されました。
生成された画像の重要ではない部分に支配される可能性を防ぐために、画像パッチの意味論的および知覚的重要性に基づいてクレジット割り当て戦略を開発するためのいくつかの新しい設計を提案します。
実験では、知覚品質とテキストと画像の位置合わせの両方にアクセスする際に、複数の一般的なテキストから画像への生成モデルとデータセットで提案された指標を評価します。
さらに、わずか 100 個のサンプルでこれらのモデルの生成能力を正常に評価できるため、実際には非常に効率的になります。

要約(オリジナル)

Text-to-image synthesis has made encouraging progress and attracted lots of public attention recently. However, popular evaluation metrics in this area, like the Inception Score and Fr’echet Inception Distance, incur several issues. First of all, they cannot explicitly assess the perceptual quality of generated images and poorly reflect the semantic alignment of each text-image pair. Also, they are inefficient and need to sample thousands of images to stabilise their evaluation results. In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i.e., a higher likelihood indicates better perceptual quality and better text-image alignment. To prevent the likelihood of being dominated by the non-crucial part of the generated image, we propose several new designs to develop a credit assignment strategy based on the semantic and perceptual significance of the image patches. In the experiments, we evaluate the proposed metric on multiple popular text-to-image generation models and datasets in accessing both the perceptual quality and the text-image alignment. Moreover, it can successfully assess the generation ability of these models with as few as a hundred samples, making it very efficient in practice.

arxiv情報

著者	Qi Chen,Chaorui Deng,Zixiong Huang,Bowen Zhang,Mingkui Tan,Qi Wu
発行日	2023-08-16 17:26:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー