Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

要約

クロスモダリティデータ変換は、画像コンピューティングに大きな関心を集めています。
深い生成モデル (\textit{e.g.}、GAN) は、これらの問題に取り組む際のパフォーマンスの向上を示しています。
それにもかかわらず、画像翻訳における基本的な課題として、忠実度を伴うゼロショット学習クロスモダリティデータ翻訳の問題は未解決のままです。
この論文では、相互情報誘導拡散モダリティデータ変換モデル (MIDiffusion) という名前の新しい教師なしゼロショット学習法を提案します。これは、目に見えないソースデータをターゲットドメインに変換することを学習します。
MIDiffusion は、ターゲットドメインの事前知識を学習するスコアマッチングベースの生成モデルを活用します。
反復ノイズ除去サンプリングを調整するために、微分可能なローカルワイズ MI レイヤー ($LMI$) を提案します。
$LMI$ は、拡散ガイダンスの統計ドメインで同一のクロスモダリティ機能をキャプチャします。
したがって、ソースドメインとターゲットドメイン間の直接のマッピングに依存しないため、ソースドメインが変更されたときに再トレーニングを行う必要はありません。
合理的な量のソースドメインデータセットが教師ありトレーニングに常に利用できるとは限らないため、この利点は、クロスモダリティデータ変換方法を実際に適用する場合に重要です。
敵対的モデルやその他のスコアマッチングベースのモデルを含む、影響力のある生成モデルのグループと比較して、MIDiffusion の高度なパフォーマンスを経験的に示しています。

要約(オリジナル)

Cross-modality data translation has attracted great interest in image computing. Deep generative models (\textit{e.g.}, GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of Zero-shot-Learning Cross-Modality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zero-shot-learning method named Mutual Information guided Diffusion cross-modality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a score-matching-based generative model, which learns the prior knowledge in the target domain. We propose a differentiable local-wise-MI-Layer ($LMI$) for conditioning the iterative denoising sampling. The $LMI$ captures the identical cross-modality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying cross-modality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarial-based and other score-matching-based models.

arxiv情報

著者	Zihao Wang,Yingyu Yang,Maxime Sermesant,Hervé Delingette,Ona Wu
発行日	2023-01-31 16:24:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー