MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers

要約

拡散モデルベースのテキストガイド画像生成は最近驚くべき進歩を遂げ、オープンドメインの画像操作タスクで魅力的な結果を生み出しています。
ただし、画像操作タスクの複雑さと多様性のため、現在、グローバルとローカルの両方の画像編集に完全なゼロショット機能を備えたモデルはほとんどありません。
この研究では、混合専門家 (MOE) コントローラーを使用して拡散モデルのテキストガイド機能をさまざまな種類の人間の指示と調整する方法を提案します。これにより、モデルがさまざまなオープンドメインの画像操作タスクを自然な方法で処理できるようになります。
言語の指示。
まず、大規模言語モデル (ChatGPT) と条件付き画像合成モデル (ControlNet) を使用して、命令ベースのローカル画像編集データセットに加えて、多数のグローバル画像転送データセットを生成します。
次に、MOE 手法と大規模なデータセットに対するタスク固有の適応トレーニングを使用して、条件付き拡散モデルは画像をグローバルおよびローカルに編集できます。
広範な実験により、オープンドメインの画像や人間による任意の指示を扱う場合、私たちのアプローチがさまざまな画像操作タスクで驚くほどうまく機能することが実証されました。
プロジェクトページを参照してください: [https://oppo-mente-lab.github.io/moe_controller/]

要約(オリジナル)

Diffusion-model-based text-guided image generation has recently made astounding progress, producing fascinating results in open-domain image manipulation tasks. Few models, however, currently have complete zero-shot capabilities for both global and local image editing due to the complexity and diversity of image manipulation tasks. In this work, we propose a method with a mixture-of-expert (MOE) controllers to align the text-guided capacity of diffusion models with different kinds of human instructions, enabling our model to handle various open-domain image manipulation tasks with natural language instructions. First, we use large language models (ChatGPT) and conditional image synthesis models (ControlNet) to generate a large number of global image transfer dataset in addition to the instruction-based local image editing dataset. Then, using an MOE technique and task-specific adaptation training on a large-scale dataset, our conditional diffusion model can edit images globally and locally. Extensive experiments demonstrate that our approach performs surprisingly well on various image manipulation tasks when dealing with open-domain images and arbitrary human instructions. Please refer to our project page: [https://oppo-mente-lab.github.io/moe_controller/]

arxiv情報

著者	Sijia Li,Chen Chen,Haonan Lu
発行日	2023-09-08 15:06:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー