Diffusion-based Visual Anagram as Multi-task Learning

要約

視覚的アナグラムとは、反転や回転などの変換によって外観が変化する画像のことである。拡散モデルの出現により、このような錯視画像の生成は、逆ノイズ除去処理中に複数のビューにまたがるノイズを平均化することで実現できる。しかし、このアプローチには2つの重大な失敗モードがある。(i)異なるビューの概念が独立に生成され、真のアナグラムとは見なされない概念の分離、(ii)特定の概念が他の概念を圧倒する概念の支配。本研究では、視覚的アナグラム生成問題を、異なる視点プロンプトを異なるタスクになぞらえたマルチタスク学習設定にキャストし、タスク間で同時にうまく整合するノイズ除去軌道を導出する。(i)異なる概念間のクロスアテンションマップの重なりを促進するアンチセグゲーション最適化戦略、(ii)異なるタスクの影響を適応的に調整するノイズベクトルバランシング手法。さらに、ノイズ予測値を直接平均化すると、統計的特性が保持されない可能性があるため、最適な性能が得られないことを観察し、ノイズ分散整流法を導出する。広範な定性的・定量的実験により、我々の手法が多様な概念にまたがる視覚的アナグラムを生成する優れた能力を持つことを実証する。

要約(オリジナル)

Visual anagrams are images that change appearance upon transformation, like flipping or rotation. With the advent of diffusion models, generating such optical illusions can be achieved by averaging noise across multiple views during the reverse denoising process. However, we observe two critical failure modes in this approach: (i) concept segregation, where concepts in different views are independently generated, which can not be considered a true anagram, and (ii) concept domination, where certain concepts overpower others. In this work, we cast the visual anagram generation problem in a multi-task learning setting, where different viewpoint prompts are analogous to different tasks,and derive denoising trajectories that align well across tasks simultaneously. At the core of our designed framework are two newly introduced techniques, where (i) an anti-segregation optimization strategy that promotes overlap in cross-attention maps between different concepts, and (ii) a noise vector balancing method that adaptively adjusts the influence of different tasks. Additionally, we observe that directly averaging noise predictions yields suboptimal performance because statistical properties may not be preserved, prompting us to derive a noise variance rectification method. Extensive qualitative and quantitative experiments demonstrate our method’s superior ability to generate visual anagrams spanning diverse concepts.

arxiv情報

著者	Zhiyuan Xu,Yinhe Chen,Huan-ang Gao,Weiyan Zhao,Guiyu Zhang,Hao Zhao
発行日	2024-12-03 18:59:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Diffusion-based Visual Anagram as Multi-task Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー