RASA: Replace Anyone, Say Anything — A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing

要約

ポートレートビデオ編集は、オーディオまたはビデオストリームに導かれた、ポートレートビデオの特定の属性の変更に焦点を当てています。
以前の方法は通常、唇領域の再現に集中するか、新しいアイデンティティへのモーション転送のためにキーポイントを抽出するための専門モデルをトレーニングする必要があります。
このペーパーでは、多用途で適応性のある編集戦略を提供するトレーニングフリーのユニバーサルポートレートビデオ編集フレームワークを紹介します。
このフレームワークは、変更された最初の参照フレームを条件付けたポートレートの外観編集と、さまざまな音声または両方の組み合わせを条件とするリップ編集をサポートします。
これは、視覚駆動型の形状コントロール、オーディオ駆動型のスーキングコントロール、およびインターフレームの時間制御を含む、ソース反転潜伏物質を備えた統合アニメーションコントロール（UAC）メカニズムに基づいています。
さらに、最初の参照フレームを調整して、特定のヘッドローテーションと表情を使用したポートレートビデオの詳細な編集を可能にすることにより、さまざまなシナリオに適合させることができます。
この包括的なアプローチにより、ポートレートビデオ編集のための全体的で柔軟なソリューションが保証されます。
実験結果は、私たちのモデルが、リップ編集タスクのために、より正確で同期した唇の動きを実現できること、および外観編集タスクのより柔軟なモーション転送を実現できることを示しています。
デモはhttps://alice01010101.github.io/rasa/で入手できます。

要約(オリジナル)

Portrait video editing focuses on modifying specific attributes of portrait videos, guided by audio or video streams. Previous methods typically either concentrate on lip-region reenactment or require training specialized models to extract keypoints for motion transfer to a new identity. In this paper, we introduce a training-free universal portrait video editing framework that provides a versatile and adaptable editing strategy. This framework supports portrait appearance editing conditioned on the changed first reference frame, as well as lip editing conditioned on varied speech, or a combination of both. It is based on a Unified Animation Control (UAC) mechanism with source inversion latents to edit the entire portrait, including visual-driven shape control, audio-driven speaking control, and inter-frame temporal control. Furthermore, our method can be adapted to different scenarios by adjusting the initial reference frame, enabling detailed editing of portrait videos with specific head rotations and facial expressions. This comprehensive approach ensures a holistic and flexible solution for portrait video editing. The experimental results show that our model can achieve more accurate and synchronized lip movements for the lip editing task, as well as more flexible motion transfer for the appearance editing task. Demo is available at https://alice01010101.github.io/RASA/.

arxiv情報

著者	Tianrui Pan,Lin Liu,Jie Liu,Xiaopeng Zhang,Jie Tang,Gangshan Wu,Qi Tian
発行日	2025-03-14 16:39:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RASA: Replace Anyone, Say Anything — A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー