AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

要約

テキストから画像へのモデル (安定拡散など) と、DreamBooth や LoRA などの対応するパーソナライゼーション技術の進歩により、誰もが手頃なコストで想像力を高品質の画像に表現できるようになりました。
その後、生成された静止画像と動きのダイナミクスをさらに組み合わせる画像アニメーション技術が強く求められています。
このレポートでは、既存のパーソナライズされたテキストから画像へのモデルのほとんどを一度にアニメーション化し、モデル固有のチューニングの労力を節約する実用的なフレームワークを提案します。
提案されたフレームワークの核心は、新しく初期化されたモーションモデリングモジュールを凍結されたテキストから画像へのモデルに挿入し、それをビデオクリップ上でトレーニングして合理的なモーション事前分布を抽出することです。
トレーニング後は、このモーションモデリングモジュールを挿入するだけで、同じベース T2I から派生したすべてのパーソナライズされたバージョンが、多様でパーソナライズされたアニメーション画像を生成するテキスト駆動モデルになります。
アニメ写真とリアルな写真にわたるいくつかの公開された代表的なパーソナライズされたテキストから画像へのモデルに対して評価を実施し、提案したフレームワークがこれらのモデルが出力のドメインと多様性を維持しながら、時間的に滑らかなアニメーションクリップを生成するのに役立つことを実証します。
コードと事前トレーニングされた重みは https://animatediff.github.io/ で公開されます。

要約(オリジナル)

With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at https://animatediff.github.io/ .

arxiv情報

著者	Yuwei Guo,Ceyuan Yang,Anyi Rao,Yaohui Wang,Yu Qiao,Dahua Lin,Bo Dai
発行日	2023-07-10 17:34:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー