Exploring CLIP for Assessing the Look and Feel of Images

要約

視覚コンテンツの知覚を測定することは、コンピュータビジョンにおける長年の問題です。
画像の見た目や品質を評価するために、多くの数学的モデルが開発されてきました。
ノイズやぼやけレベルなどの劣化を定量化する際のそのようなツールの有効性にもかかわらず、そのような定量化は人間の言語とゆるく結びついています。
視覚コンテンツの感触に関するより抽象的な認識に関しては、既存の方法は、面倒なユーザー調査によって収集されたラベル付きデータで明示的にトレーニングされた教師ありモデルにのみ依存できます。
この論文では、画像の品質知覚（外観）と抽象的な知覚（感触）の両方をゼロで評価するために、対照言語-画像事前トレーニング（CLIP）モデルに事前にカプセル化された豊富な視覚言語を探索することにより、従来のパラダイムを超えています
-ショットマナー。
特に、効果的なプロンプトデザインについて説明し、以前の方法を活用するための効果的なプロンプトペアリング戦略を示します。
また、制御されたデータセットと画質評価（IQA）ベンチマークに関する広範な実験も提供しています。
私たちの結果は、CLIPがさまざまな知覚評価によく一般化する意味のある事前情報をキャプチャすることを示しています。
コードはhttps://github.com/IceClear/CLIP-IQAで入手できます。

要約(オリジナル)

Measuring the perception of visual content is a long-standing problem in computer vision. Many mathematical models have been developed to evaluate the look or quality of an image. Despite the effectiveness of such tools in quantifying degradations such as noise and blurriness levels, such quantification is loosely coupled with human language. When it comes to more abstract perception about the feel of visual content, existing methods can only rely on supervised models that are explicitly trained with labeled data collected via laborious user study. In this paper, we go beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. In particular, we discuss effective prompt designs and show an effective prompt pairing strategy to harness the prior. We also provide extensive experiments on controlled datasets and Image Quality Assessment (IQA) benchmarks. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments. Code will be avaliable at https://github.com/IceClear/CLIP-IQA.

arxiv情報

著者	Jianyi Wang,Kelvin C. K. Chan,Chen Change Loy
発行日	2022-07-25 17:58:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring CLIP for Assessing the Look and Feel of Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー