Improving Video Generation with Human Feedback

要約

ビデオ生成は、修正されたフロー技術によって大幅な進歩を遂げましたが、スムーズでない動きや、ビデオとプロンプト間のずれなどの問題は依然として残ります。
この作業では、人間のフィードバックを活用してこれらの問題を軽減し、ビデオ生成モデルを改良する体系的なパイプラインを開発します。
具体的には、多次元にわたるペアごとのアノテーションを組み込んだ、最新のビデオ生成モデルに焦点を当てた大規模な人間の嗜好データセットを構築することから始めます。
次に、多次元ビデオ報酬モデルである VideoReward を紹介し、注釈とさまざまなデザインの選択がその報酬効果にどのような影響を与えるかを検証します。
KL 正則化による報酬の最大化を目的とした統合強化学習の観点から、拡散モデルのアルゴリズムを拡張してフローベースモデル用の 3 つのアライメントアルゴリズムを導入します。
これらには、フローの直接優先最適化 (Flow-DPO) とフローの報酬加重回帰 (Flow-RWR) という 2 つのトレーニング時間戦略と、ノイズの多いビデオに報酬ガイダンスを直接適用する推論時間手法である Flow-NRG が含まれます。
実験結果は、VideoReward が既存の報酬モデルを大幅に上回っており、Flow-DPO が Flow-RWR および標準の教師あり微調整手法の両方と比較して優れたパフォーマンスを示していることを示しています。
さらに、Flow-NRG を使用すると、ユーザーは推論中に複数の目標にカスタムの重みを割り当てることができ、パーソナライズされたビデオ品質のニーズに対応できます。
プロジェクトページ: https://gongyeliu.github.io/videoalign。

要約(オリジナル)

Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models by extending those from diffusion models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and standard supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. Project page: https://gongyeliu.github.io/videoalign.

arxiv情報

著者	Jie Liu,Gongye Liu,Jiajun Liang,Ziyang Yuan,Xiaokun Liu,Mingwu Zheng,Xiele Wu,Qiulin Wang,Wenyu Qin,Menghan Xia,Xintao Wang,Xiaohong Liu,Fei Yang,Pengfei Wan,Di Zhang,Kun Gai,Yujiu Yang,Wanli Ouyang
発行日	2025-01-23 18:55:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Video Generation with Human Feedback

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー