Taming Rectified Flow for Inversion and Editing

要約

FLUX や OpenSora などの整流ベースの拡散変圧器は、画像およびビデオ生成の分野で優れたパフォーマンスを実証しています。
これらのモデルは堅牢な生成機能にもかかわらず、不正確な反転に悩まされることが多く、画像やビデオ編集などの下流タスクでの有効性がさらに制限される可能性があります。
この問題に対処するために、我々は、修正されたフロー ODE を解くプロセスでのエラーを削減することで逆変換精度を向上させる、トレーニング不要の新しいサンプラーである RF-Solver を提案します。
具体的には、整流された流れの ODE の正確な定式化を導出し、高次のテイラー展開を実行してその非線形成分を推定し、各タイムステップでの近似誤差を大幅に減少させます。
RF-Solver に基づいて、画像およびビデオ編集用の特殊なサブモジュールで構成される RF-Edit をさらに設計します。
編集プロセス中にセルフアテンションレイヤーの機能を共有することにより、RF-Edit はソース画像またはビデオの構造情報を効果的に保存しながら、高品質の編集結果を実現します。
私たちのアプローチは、画像およびビデオタスク用の事前トレーニング済みの修正フローベースのモデルと互換性があり、追加のトレーニングや最適化は必要ありません。
テキストから画像への生成、画像とビデオの反転、画像とビデオの編集に関する広範な実験により、私たちの手法の堅牢なパフォーマンスと適応性が実証されました。
コードは https://github.com/wangjiangshan0725/RF-Solver-Edit で入手できます。

要約(オリジナル)

Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation. Despite their robust generative capabilities, these models often suffer from inaccurate inversion, which could further limit their effectiveness in downstream tasks such as image and video editing. To address this issue, we propose RF-Solver, a novel training-free sampler that enhances inversion precision by reducing errors in the process of solving rectified flow ODEs. Specifically, we derive the exact formulation of the rectified flow ODE and perform a high-order Taylor expansion to estimate its nonlinear components, significantly decreasing the approximation error at each timestep. Building upon RF-Solver, we further design RF-Edit, which comprises specialized sub-modules for image and video editing. By sharing self-attention layer features during the editing process, RF-Edit effectively preserves the structural information of the source image or video while achieving high-quality editing results. Our approach is compatible with any pre-trained rectified-flow-based models for image and video tasks, requiring no additional training or optimization. Extensive experiments on text-to-image generation, image & video inversion, and image & video editing demonstrate the robust performance and adaptability of our methods. Code is available at https://github.com/wangjiangshan0725/RF-Solver-Edit.

arxiv情報

著者	Jiangshan Wang,Junfu Pu,Zhongang Qi,Jiayi Guo,Yue Ma,Nisha Huang,Yuxin Chen,Xiu Li,Ying Shan
発行日	2024-11-07 14:29:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Taming Rectified Flow for Inversion and Editing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー