Dynamic Concepts Personalization from Single Videos

要約

生成的なテキストから画像へのパーソナライズモデルをパーソナライズすると、驚くべき進歩が見られますが、このパーソナライズをテキストからビデオへのモデルに拡張することは、独自の課題を提示します。
静的な概念とは異なり、テキストからビデオへのパーソナライズモデルは、動的な概念をキャプチャする可能性があります。つまり、外観だけでなく、動きによって定義されるエンティティもキャプチャします。
このペーパーでは、動的概念を備えた拡散トランス（DITS）ベースの生成ビデオモデルをパーソナライズするための新しいフレームワークであるセットアンドシーケンスを紹介します。
私たちのアプローチは、空間的特徴と時間的特徴を明示的に分離しないアーキテクチャ内に時空間重量空間を課します。
これは2つの重要な段階で達成されます。
まず、ビデオから順序付けられていないフレームのセットを使用して、低ランク適応（LORA）レイヤーを微調整して、外観を表すアイデンティティのLORAベースで、時間的干渉がないことを学びます。
第2段階では、アイデンティティロラが冷凍されているため、モーション残差で係数を増やし、動画シーケンス全体で微調整して、モーションダイナミクスをキャプチャします。
私たちのセットとシーケンスフレームワークは、動的概念をビデオモデルの出力ドメインに効果的に埋め込む空間的重量空間をもたらし、動的概念をパーソナライズするための新しいベンチマークを設定しながら、前例のない編集可能性と構成性を可能にします。

要約(オリジナル)

Personalizing generative text-to-image models has seen remarkable progress, but extending this personalization to text-to-video models presents unique challenges. Unlike static concepts, personalizing text-to-video models has the potential to capture dynamic concepts, i.e., entities defined not only by their appearance but also by their motion. In this paper, we introduce Set-and-Sequence, a novel framework for personalizing Diffusion Transformers (DiTs)-based generative video models with dynamic concepts. Our approach imposes a spatio-temporal weight space within an architecture that does not explicitly separate spatial and temporal features. This is achieved in two key stages. First, we fine-tune Low-Rank Adaptation (LoRA) layers using an unordered set of frames from the video to learn an identity LoRA basis that represents the appearance, free from temporal interference. In the second stage, with the identity LoRAs frozen, we augment their coefficients with Motion Residuals and fine-tune them on the full video sequence, capturing motion dynamics. Our Set-and-Sequence framework results in a spatio-temporal weight space that effectively embeds dynamic concepts into the video model’s output domain, enabling unprecedented editability and compositionality while setting a new benchmark for personalizing dynamic concepts.

arxiv情報

著者	Rameen Abdal,Or Patashnik,Ivan Skorokhodov,Willi Menapace,Aliaksandr Siarohin,Sergey Tulyakov,Daniel Cohen-Or,Kfir Aberman
発行日	2025-02-20 18:53:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Concepts Personalization from Single Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー