EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

要約

視覚と言語の生成モデルは近年、肥大化しすぎています。
ビデオ生成に関しては、高品質のビデオを生成するためのさまざまなオープンソースモデルや公開サービスがリリースされています。
ただし、これらの方法では、パフォーマンスを評価するために、FVD や IS などのいくつかの学術的な指標が使用されることがよくあります。
大規模な条件付き生成モデルは、マルチアスペクト機能を備えた非常に大規模なデータセットでトレーニングされることが多いため、単純なメトリクスから判断するのは難しいと私たちは主張します。
そこで、生成されたビデオのパフォーマンスを徹底的に評価するための新しいフレームワークとパイプラインを提案します。
これを達成するために、まず、大規模言語モデルを利用して現実世界のプロンプトリストを分析することにより、テキストからビデオへの生成のための新しいプロンプトリストを実行します。
次に、ビジュアル品質、コンテンツ品質、モーション品質、およびテキストとキャプションの整合性の観点から、約 18 の客観的な指標を使用して、慎重に設計されたベンチマークで最先端のビデオ生成モデルを評価します。
モデルの最終的なリーダーボードを取得するために、一連の係数を当てはめて、客観的な指標をユーザーの意見に合わせます。
提案された意見調整方法に基づくと、最終スコアは単に指標を平均するよりも高い相関を示し、提案された評価方法の有効性を示しています。

要約(オリジナル)

The vision and language generative models have been overgrown in recent years. For video generation, various open-sourced models and public-available services are released for generating high-visual quality videos. However, these methods often use a few academic metrics, for example, FVD or IS, to evaluate the performance. We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Thus, we propose a new framework and pipeline to exhaustively evaluate the performance of the generated videos. To achieve this, we first conduct a new prompt list for text-to-video generation by analyzing the real-world prompt list with the help of the large language model. Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmarks, in terms of visual qualities, content qualities, motion qualities, and text-caption alignment with around 18 objective metrics. To obtain the final leaderboard of the models, we also fit a series of coefficients to align the objective metrics to the users’ opinions. Based on the proposed opinion alignment method, our final score shows a higher correlation than simply averaging the metrics, showing the effectiveness of the proposed evaluation method.

arxiv情報

著者	Yaofang Liu,Xiaodong Cun,Xuebo Liu,Xintao Wang,Yong Zhang,Haoxin Chen,Yang Liu,Tieyong Zeng,Raymond Chan,Ying Shan
発行日	2023-10-17 17:50:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー