IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion

要約

顔のビデオ編集は、顔の表情や属性を操作できるため、コンテンツ作成者にとってますます重要になっています。
しかし、既存のモデルは、編集品質の低さ、計算コストの高さ、多様な編集間で顔の同一性を維持することの難しさなどの課題に直面しています。
さらに、これらのモデルは、事前定義された顔属性の編集に制限されることが多く、さまざまな編集プロンプトに対する柔軟性が制限されます。
これらの課題に対処するために、事前トレーニングされたテキストから画像への (T2I) 拡散モデルの豊富な潜在空間を活用し、特に顔ビデオ編集タスク向けに微調整する新しい顔ビデオ編集フレームワークを提案します。
私たちのアプローチでは、ビデオフレーム全体でのアイデンティティの保持を確保しながら、高品質でローカライズされたテキスト駆動の編集を可能にする、ターゲットを絞った微調整スキームを導入しています。
さらに、推論中に事前トレーニングされた T2I モデルを使用することで、ビデオシーケンス全体で時間的な一貫性を維持しながら、編集時間を 80% 大幅に短縮します。
私たちは、さまざまな頭のポーズ、複雑なアクションシーケンス、多様な表情など、幅広い困難なシナリオにわたる広範なテストを通じてアプローチの有効性を評価します。
私たちの手法は一貫して既存の手法を上回っており、幅広いメトリクスとベンチマークにわたって優れたパフォーマンスを示しています。

要約(オリジナル)

Facial video editing has become increasingly important for content creators, enabling the manipulation of facial expressions and attributes. However, existing models encounter challenges such as poor editing quality, high computational costs and difficulties in preserving facial identity across diverse edits. Additionally, these models are often constrained to editing predefined facial attributes, limiting their flexibility to diverse editing prompts. To address these challenges, we propose a novel facial video editing framework that leverages the rich latent space of pre-trained text-to-image (T2I) diffusion models and fine-tune them specifically for facial video editing tasks. Our approach introduces a targeted fine-tuning scheme that enables high quality, localized, text-driven edits while ensuring identity preservation across video frames. Additionally, by using pre-trained T2I models during inference, our approach significantly reduces editing time by 80%, while maintaining temporal consistency throughout the video sequence. We evaluate the effectiveness of our approach through extensive testing across a wide range of challenging scenarios, including varying head poses, complex action sequences, and diverse facial expressions. Our method consistently outperforms existing techniques, demonstrating superior performance across a broad set of metrics and benchmarks.

arxiv情報

著者	Tharun Anand,Aryan Garg,Kaushik Mitra
発行日	2025-01-13 18:08:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー