Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

要約

自己回帰ビデオ拡散モデルの新しいトレーニングパラダイムである自己強制を紹介します。
これは、露出バイアスの長年の問題に対処します。この場合、グラウンドトゥルースコンテキストでトレーニングされたモデルは、推論中に独自の不完全な出力に条件付けられたシーケンスを生成する必要があります。
グラウンドトゥルースコンテキストフレームに基づいて将来のフレームをデノイズする以前の方法とは異なり、トレーニング中にキー値（kV）キャッシングを使用して自己回帰ロールアウトを実行することにより、以前に自己生成された出力で各フレームの生成を強制的に強制します。
この戦略により、従来のフレームごとの目標のみに依存するのではなく、生成されたシーケンス全体の品質を直接評価するビデオレベルでの全体的な損失を介して監督が可能になります。
トレーニング効率を確保するために、計算コストとパフォーマンスのバランスをとる、確率的勾配切り捨て戦略とともに、いくつかのステップ拡散モデルを採用しています。
さらに、効率的な自己回帰ビデオ外挿を可能にするローリングKVキャッシュメカニズムを導入します。
広範な実験は、私たちのアプローチが、単一のGPUでサブセカンドレイテンシでリアルタイムストリーミングビデオ生成を達成し、大幅に遅く非因果的拡散モデルの生成品質を一致させるか、それを上回っていることを示しています。
プロジェクトWebサイト：http：//self-forcing.github.io/

要約(オリジナル)

We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models. It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs during inference. Unlike prior methods that denoise future frames based on ground-truth context frames, Self Forcing conditions each frame’s generation on previously self-generated outputs by performing autoregressive rollout with key-value (KV) caching during training. This strategy enables supervision through a holistic loss at the video level that directly evaluates the quality of the entire generated sequence, rather than relying solely on traditional frame-wise objectives. To ensure training efficiency, we employ a few-step diffusion model along with a stochastic gradient truncation strategy, effectively balancing computational cost and performance. We further introduce a rolling KV cache mechanism that enables efficient autoregressive video extrapolation. Extensive experiments demonstrate that our approach achieves real-time streaming video generation with sub-second latency on a single GPU, while matching or even surpassing the generation quality of significantly slower and non-causal diffusion models. Project website: http://self-forcing.github.io/

arxiv情報

著者	Xun Huang,Zhengqi Li,Guande He,Mingyuan Zhou,Eli Shechtman
発行日	2025-06-09 17:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー