Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

要約

拡散モデルは、テキストの説明に基づいて印象的な画像を生成することができ、これらのモデルの拡張により、ユーザーは比較的粗いスケールで画像を編集できます。
ただし、拡散モデルを使用して画像内のオブジェクトのレイアウト、位置、姿勢、形状を正確に編集することは依然として困難です。
この目的を達成するために、私たちはモーションガイダンスを提案します。これは、ユーザーが画像内の各ピクセルが移動する場所を示す高密度で複雑なモーションフィールドを指定できるようにするゼロショット技術です。
モーションガイダンスは、既製のオプティカルフローネットワークを介して勾配を使用して拡散サンプリングプロセスを制御することによって機能します。
具体的には、ソース画像と視覚的に類似しながら、フローネットワークによって推定されるとおりに、サンプルが望ましい動きをするように促すガイダンス損失を設計します。
拡散モデルからのサンプリングと、誘導損失が低くなるようにサンプルを誘導することを同時に行うことで、モーション編集された画像を取得できます。
私たちの技術が複雑な動きに作用し、実際の画像と生成された画像の高品質な編集を生成することを実証します。

要約(オリジナル)

Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale. However, the ability to precisely edit the layout, position, pose, and shape of objects in images with diffusion models is still difficult. To this end, we propose motion guidance, a zero-shot technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move. Motion guidance works by steering the diffusion sampling process with the gradients through an off-the-shelf optical flow network. Specifically, we design a guidance loss that encourages the sample to have the desired motion, as estimated by a flow network, while also being visually similar to the source image. By simultaneously sampling from a diffusion model and guiding the sample to have low guidance loss, we can obtain a motion-edited image. We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images.

arxiv情報

著者	Daniel Geng,Andrew Owens
発行日	2024-01-31 18:59:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー