Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

要約

現実世界の低解像度 (LR) ビデオには多様かつ複雑な劣化があり、高解像度 (HR) ビデオを高品質で再現するためのビデオ超解像度 (VSR) アルゴリズムに大きな課題を課しています。
最近、拡散モデルは、画像復元タスクの現実的な詳細を生成する際に説得力のあるパフォーマンスを示しています。
ただし、拡散処理にはランダム性があり、復元画像の内容を制御することが困難です。
ビデオの知覚品質には時間的一貫性が重要であるため、拡散モデルを VSR タスクに適用する場合、この問題はより深刻になります。
この論文では、事前トレーニングされた潜在拡散モデルの強みを活用することにより、効果的な現実世界の VSR アルゴリズムを提案します。
隣接するフレーム間でコンテンツの一貫性を確保するために、LR ビデオの時間ダイナミクスを利用して、モーションガイド損失を使用して潜在サンプリングパスを最適化することで拡散プロセスをガイドし、生成された HR ビデオが一貫性のある連続的なビジュアルフローを維持できるようにします。
生成された詳細の不連続性をさらに軽減するために、デコーダに時間モジュールを挿入し、革新的なシーケンス指向の損失で微調整します。
提案されたモーションガイド型潜在拡散 (MGLD) ベースの VSR アルゴリズムは、現実世界の VSR ベンチマークデータセットで最先端のものよりも大幅に優れた知覚品質を実現し、提案されたモデル設計とトレーニング戦略の有効性を検証します。

要約(オリジナル)

Real-world low-resolution (LR) videos have diverse and complex degradations, imposing great challenges on video super-resolution (VSR) algorithms to reproduce their high-resolution (HR) counterparts with high quality. Recently, the diffusion models have shown compelling performance in generating realistic details for image restoration tasks. However, the diffusion process has randomness, making it hard to control the contents of restored images. This issue becomes more serious when applying diffusion models to VSR tasks because temporal consistency is crucial to the perceptual quality of videos. In this paper, we propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models. To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow. To further mitigate the discontinuity of generated details, we insert temporal module to the decoder and fine-tune it with an innovative sequence-oriented loss. The proposed motion-guided latent diffusion (MGLD) based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets, validating the effectiveness of the proposed model design and training strategies.

arxiv情報

著者	Xi Yang,Chenhang He,Jianqi Ma,Lei Zhang
発行日	2024-07-12 13:55:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー