Scaling-up Perceptual Video Quality Assessment

要約

データスケーリング法は、さまざまな下流タスクにわたる大規模なマルチモーダルモデル（LMM）のパフォーマンスを大幅に向上させることが示されています。
ただし、知覚ビデオ品質評価（VQA）の領域では、ラベル付きのリソースが不足しているため、データセットのスケールが不十分なため、スケーリング法の可能性は前例のないままです。
これに対処するために、\ textBf {omnivqa}を提案します。これは、高品質で人間のループVQAマルチモーダル命令データベース（MIDB）を効率的に構築するように設計された効率的なフレームワークです。
次に、スケールアップして、VQAフィールドで最大のMIDBである\ textBf {omnivqa-chat-400k}を作成します。
私たちの焦点は、技術的および審美的な品質の次元にあり、豊富なインテスト内命令データが豊富に密集しているVQAの知識を提供します。
さらに、モデルの定量的品質評価機能を強化するために、\ textbf {omnivqa-mos-20k}データセットを構築しました。
次に、品質の理解と品質評価タスクのためにデータセットからの知識を効果的に活用する\ textBF {complentionary}トレーニング戦略を導入します。
さらに、\ textbf {omnivqa-fg（fine-grain）venchmark}を提案して、モデルの微細なパフォーマンスを評価します。
私たちの結果は、私たちのモデルが品質の理解と評価の両方のタスクの両方で最先端のパフォーマンスを達成することを示しています。

要約(オリジナル)

The data scaling law has been shown to significantly enhance the performance of large multi-modal models (LMMs) across various downstream tasks. However, in the domain of perceptual video quality assessment (VQA), the potential of scaling law remains unprecedented due to the scarcity of labeled resources and the insufficient scale of datasets. To address this, we propose \textbf{OmniVQA}, an efficient framework designed to efficiently build high-quality, human-in-the-loop VQA multi-modal instruction databases (MIDBs). We then scale up to create \textbf{OmniVQA-Chat-400K}, the largest MIDB in the VQA field concurrently. Our focus is on the technical and aesthetic quality dimensions, with abundant in-context instruction data to provide fine-grained VQA knowledge. Additionally, we have built the \textbf{OmniVQA-MOS-20K} dataset to enhance the model’s quantitative quality rating capabilities. We then introduce a \textbf{complementary} training strategy that effectively leverages the knowledge from datasets for quality understanding and quality rating tasks. Furthermore, we propose the \textbf{OmniVQA-FG (fine-grain)-Benchmark} to evaluate the fine-grained performance of the models. Our results demonstrate that our models achieve state-of-the-art performance in both quality understanding and rating tasks.

arxiv情報

著者	Ziheng Jia,Zicheng Zhang,Zeyu Zhang,Yingji Liang,Xiaorong Zhu,Chunyi Li,Jinliang Han,Haoning Wu,Bin Wang,Haoran Zhang,Guanyu Zhu,Qiyong Zhao,Xiaohong Liu,Guangtao Zhai,Xiongkuo Min
発行日	2025-05-28 16:24:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scaling-up Perceptual Video Quality Assessment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー