CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

要約

近年、無参照点群品質評価 (NR-PCQA) 研究は大きな進歩を遂げています。
しかし、既存の手法は主に視覚データから平均意見スコア（MOS）への直接マッピング関数を求めており、実際の主観評価のメカニズムと矛盾しています。
これに対処するために、CLIP-PCQA という新しい言語駆動型の PCQA 手法を提案します。
人間は特定のスコアではなく、個別の品質説明 (「優れた」と「悪い」など) を使用して視覚的な品質を説明することを好むことを考慮して、主観的評価のプロセスをシミュレートするために検索ベースのマッピング戦略を採用します。
より具体的には、CLIP の哲学に基づいて、視覚的特徴と、さまざまな品質の説明に対応する複数のテキスト特徴の間のコサイン類似度を計算します。そのプロセスでは、特徴抽出を強化するために効果的なコントラスト損失と学習可能なプロンプトが導入されます。
一方、個人的な制限と主観的な実験の偏りを考慮して、特徴の類似性をさらに確率に変換し、単一の MOS ではなく意見スコア分布 (OSD) を最終ターゲットとして考慮します。
実験結果は、当社の CLIP-PCQA が他の最先端 (SOTA) アプローチよりも優れていることを示しています。

要約(オリジナル)

In recent years, No-Reference Point Cloud Quality Assessment (NR-PCQA) research has achieved significant progress. However, existing methods mostly seek a direct mapping function from visual data to the Mean Opinion Score (MOS), which is contradictory to the mechanism of practical subjective evaluation. To address this, we propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions (e.g., ‘excellent’ and ‘poor’) rather than specific scores, we adopt a retrieval-based mapping strategy to simulate the process of subjective assessment. More specifically, based on the philosophy of CLIP, we calculate the cosine similarity between the visual features and multiple textual features corresponding to different quality descriptions, in which process an effective contrastive loss and learnable prompts are introduced to enhance the feature extraction. Meanwhile, given the personal limitations and bias in subjective experiments, we further covert the feature similarities into probabilities and consider the Opinion Score Distribution (OSD) rather than a single MOS as the final target. Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.

arxiv情報

著者	Yating Liu,Yujie Zhang,Ziyu Shan,Yiling Xu
発行日	2025-01-17 09:43:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー