Multimodal Image Synthesis and Editing: The Generative AI Era

要約

実世界では情報がさまざまなモダリティで存在するため、マルチモーダル情報間の効果的な相互作用と融合は、コンピュータビジョンやディープラーニング研究におけるマルチモーダルデータの作成と認識に重要な役割を果たします。
マルチモーダル情報間の相互作用をモデル化する優れた能力により、マルチモーダル画像の合成と編集は、近年の注目の研究テーマとなっています。
ネットワークトレーニングに明示的なガイダンスを提供する代わりに、マルチモーダルガイダンスは、画像の合成と編集のための直感的で柔軟な手段を提供します。
一方で、この分野は、マルチモーダル特徴の調整、高解像度画像の合成、忠実な評価指標などにおいて、いくつかの課題にも直面しています。本調査では、最近のマルチモーダル画像合成と編集の進歩を包括的に文脈化して定式化します。
データモダリティとモデルタイプに応じた分類。
まず、画像の合成と編集におけるさまざまなガイダンスモダリティの紹介から始め、次に、モデルタイプに応じてマルチモーダルな画像合成と編集のアプローチを広範囲に説明します。
その後、ベンチマークデータセットと評価指標、および対応する実験結果について説明します。
最後に、現在の研究課題と将来の研究の可能な方向性についての洞察を提供します。
この調査に関連するプロジェクトは、https://github.com/fnzhan/Generative-AI で入手できます。

要約(オリジナル)

As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research. With superb power in modeling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years. Instead of providing explicit guidance for network training, multimodal guidance offers intuitive and flexible means for image synthesis and editing. On the other hand, this field is also facing several challenges in alignment of multimodal features, synthesis of high-resolution images, faithful evaluation metrics, etc. In this survey, we comprehensively contextualize the advance of the recent multimodal image synthesis and editing and formulate taxonomies according to data modalities and model types. We start with an introduction to different guidance modalities in image synthesis and editing, and then describe multimodal image synthesis and editing approaches extensively according to their model types. After that, we describe benchmark datasets and evaluation metrics as well as corresponding experimental results. Finally, we provide insights about the current research challenges and possible directions for future research. A project associated with this survey is available at https://github.com/fnzhan/Generative-AI.

arxiv情報

著者	Fangneng Zhan,Yingchen Yu,Rongliang Wu,Jiahui Zhang,Shijian Lu,Lingjie Liu,Adam Kortylewski,Christian Theobalt,Eric Xing
発行日	2023-08-24 16:17:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multimodal Image Synthesis and Editing: The Generative AI Era

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー