VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

要約

近年、ビデオ生成において大きな進歩が見られます。
ただし、自動ビデオ指標の開発は大幅に遅れています。
既存の指標はいずれも、生成されたビデオに対して信頼できるスコアを提供できません。
主な障壁は、人間が注釈を付けた大規模なデータセットが存在しないことです。
この論文では、11 の既存のビデオ生成モデルから 37.6K の合成ビデオにわたる人間が提供したマルチアスペクトスコアを含む初の大規模データセットである VideoFeedback をリリースします。
VideoFeedback に基づいて VideoScore (Mantis から初期化) をトレーニングし、自動ビデオ品質評価を可能にします。
実験によると、VideoScore と人間の間のスピアマン相関は、VideoFeedback テストで 77.1 に達し、これまでの最高の指標を約 50 ポイント上回りました。
他の保留された EvalCrafter、GenAI-Bench、および VBench に関するさらなる結果は、VideoScore が他の指標よりも一貫して人間の審査員との相関性がはるかに高いことを示しています。
これらの結果により、VideoScore は人間の評価者にとって、(1) 進捗状況を追跡するためにさまざまなビデオモデルを評価する (2) ヒューマンフィードバックによる強化学習 (RLHF) でのきめ細かい人間によるフィードバックをシミュレートして、現在のビデオを改善するための優れた代替手段として機能すると考えています。
世代モデル。

要約(オリジナル)

The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VideoScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that VideoScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe VideoScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

arxiv情報

著者	Xuan He,Dongfu Jiang,Ge Zhang,Max Ku,Achint Soni,Sherman Siu,Haonan Chen,Abhranil Chandra,Ziyan Jiang,Aaran Arulraj,Kai Wang,Quy Duc Do,Yuansheng Ni,Bohan Lyu,Yaswanth Narsupalli,Rongqi Fan,Zhiheng Lyu,Yuchen Lin,Wenhu Chen
発行日	2024-06-24 16:22:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー