Lazy Diffusion Transformer for Interactive Image Editing

要約

部分的な画像更新を効率的に生成する新しい拡散変換器 LazyDiffusion を紹介します。
私たちのアプローチは、空白のキャンバスまたは画像から開始して、ユーザーがバイナリマスクとテキストプロンプトを使用してローカライズされた画像変更のシーケンスを指定する対話型画像編集アプリケーションをターゲットとしています。
私たちの発電機は 2 段階で動作します。
まず、コンテキストエンコーダが現在のキャンバスとユーザーマスクを処理して、生成する領域に合わせたコンパクトなグローバルコンテキストを生成します。
第 2 に、このコンテキストを条件として、拡散ベースのトランスフォーマーデコーダはマスクされたピクセルを「遅延」方式で合成します。つまり、マスクされた領域のみを生成します。
これは、キャンバス全体を再生成して時間と計算を無駄にしたり、処理をマスクの周囲の狭い長方形のトリミングに限定して全体的な画像コンテキストを完全に無視したりする以前の作業とは対照的です。
デコーダーのランタイムはマスクサイズに合わせて調整されますが、マスクサイズは通常は小さいですが、エンコーダーのオーバーヘッドは無視できます。
私たちのアプローチは、編集マスクが画像の 10% を表す一般的なユーザーインタラクションで 10 倍の高速化を実現しながら、品質と忠実度の点で最先端の修復手法と競合できることを実証します。

要約(オリジナル)

We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the current canvas and user mask to produce a compact global context tailored to the region to generate. Second, conditioned on this context, a diffusion-based transformer decoder synthesizes the masked pixels in a ‘lazy’ fashion, i.e., it only generates the masked region. This contrasts with previous works that either regenerate the full canvas, wasting time and computation, or confine processing to a tight rectangular crop around the mask, ignoring the global image context altogether. Our decoder’s runtime scales with the mask size, which is typically small, while our encoder introduces negligible overhead. We demonstrate that our approach is competitive with state-of-the-art inpainting methods in terms of quality and fidelity while providing a 10x speedup for typical user interactions, where the editing mask represents 10% of the image.

arxiv情報

著者	Yotam Nitzan,Zongze Wu,Richard Zhang,Eli Shechtman,Daniel Cohen-Or,Taesung Park,Michaël Gharbi
発行日	2024-04-18 17:59:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lazy Diffusion Transformer for Interactive Image Editing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー