SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

要約

意味セグメンテーションと意味画像合成は、視覚認識と生成における 2 つの代表的なタスクです。
既存の手法ではこれらを 2 つの別個のタスクとして考慮しますが、私たちは統一された拡散ベースのフレームワーク (SemFlow) を提案し、それらを 1 対の逆問題としてモデル化します。
具体的には、整流理論を動機として、実際の画像の分布とセマンティックマスクの間を転送する常微分方程式 (ODE) モデルをトレーニングします。
トレーニングオブジェクトは対称であるため、2 つの分布に属するサンプル、画像とセマンティックマスクを簡単に可逆的に転送できます。
セマンティックセグメンテーションの場合、私たちのアプローチは、拡散出力のランダム性とセグメンテーション結果の一意性の間の矛盾を解決します。
画像合成については、意味カテゴリを変更せずに生成された結果の多様性を高めるための有限摂動アプローチを提案します。
実験では、SemFlow がセマンティックセグメンテーションおよびセマンティック画像合成タスクで競合する結果を達成することを示しています。
このシンプルなフレームワークが、人々が低レベルのビジョンと高レベルのビジョンの統合を再考するきっかけになれば幸いです。
プロジェクトページ: https://github.com/wang-chaoyang/SemFlow。

要約(オリジナル)

Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport between the distributions of real images and semantic masks. As the training object is symmetric, samples belonging to the two distributions, images and semantic masks, can be effortlessly transferred reversibly. For semantic segmentation, our approach solves the contradiction between the randomness of diffusion outputs and the uniqueness of segmentation results. For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories. Experiments show that our SemFlow achieves competitive results on semantic segmentation and semantic image synthesis tasks. We hope this simple framework will motivate people to rethink the unification of low-level and high-level vision. Project page: https://github.com/wang-chaoyang/SemFlow.

arxiv情報

著者	Chaoyang Wang,Xiangtai Li,Lu Qi,Henghui Ding,Yunhai Tong,Ming-Hsuan Yang
発行日	2024-05-30 17:34:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー