QPT V2: Masked Image Modeling Advances Visual Scoring

要約

品質評価と美的評価は、視覚コンテンツの知覚された品質と美的感覚を評価することを目的としています。
現在の学習ベースの手法は、ラベル付きデータの不足に大きく悩まされており、一般化の観点からは次善のパフォーマンスを発揮します。
マスクされたイメージモデリング (MIM) は、さまざまな高レベルのタスク (分類、検出など) にわたって注目すべき進歩を達成しました。
この研究では、新しい視点を取り入れて、品質と美的意識の観点からその機能を調査します。
この目的を達成するために、私たちは品質と美学を意識した事前トレーニング (QPT V2) を提案します。これは、品質と美学の評価に対する統合ソリューションを提供する、MIM に基づく初の事前トレーニングフレームワークです。
高レベルのセマンティクスときめ細かい詳細を認識するために、事前トレーニングデータが厳選されます。
品質と美観に関連する要素を包括的に包含するために、劣化が導入されます。
マルチスケールの品質と美的情報を取得するには、モデルの構造が変更されます。
11 のダウンストリームベンチマークに関する広範な実験結果は、現在の最先端のアプローチや他の事前トレーニングパラダイムと比較して、QPT V2 の優れたパフォーマンスを明らかに示しています。
コードとモデルは \url{https://github.com/KeiChiTse/QPT-V2} でリリースされます。

要約(オリジナル)

Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms. Code and models will be released at \url{https://github.com/KeiChiTse/QPT-V2}.

arxiv情報

著者	Qizhi Xie,Kun Yuan,Yunpeng Qu,Mingda Wu,Ming Sun,Chao Zhou,Jihong Zhu
発行日	2024-07-23 14:53:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

QPT V2: Masked Image Modeling Advances Visual Scoring

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー