U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

要約

拡散トランスフォーマー (DiT) は、潜在空間画像生成のための拡散タスクにトランスフォーマーアーキテクチャを導入します。
一連の変圧器ブロックをチェーンする等方性アーキテクチャにより、DiT は競争力のあるパフォーマンスと優れた拡張性を実証します。
しかし一方で、DiT による U-Net の放棄とその後の改善については再考する価値があります。
この目的を達成するために、U-Net アーキテクチャの DiT と等方性 DiT を比較することで、簡単なおもちゃの実験を行います。
U-Net アーキテクチャは、U-Net 誘導バイアスの中でわずかな利点しか得られないことがわかり、U-Net スタイルの DiT 内に潜在的な冗長性があることを示しています。
U-Net バックボーン機能が低周波数に支配されているという発見に触発され、セルフアテンションのためにクエリ-キー-値タプルに対してトークンダウンサンプリングを実行します。これにより、計算量が大幅に削減されたにもかかわらず、さらなる改善がもたらされます。
ダウンサンプリングされたトークンによる自己注意に基づいて、論文では一連の U 字型 DiT (U-DiT) を提案し、U-DiT モデルの並外れたパフォーマンスを実証するために広範な実験を実施します。
提案された U-DiT は、わずか 1/6 の計算コストで DiT-XL/2 を上回るパフォーマンスを実現できます。
コードは https://github.com/YuchuanTian/U-DiT で入手できます。

要約(オリジナル)

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention that bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the paper and conduct extensive experiments to demonstrate the extraordinary performance of U-DiT models. The proposed U-DiT could outperform DiT-XL/2 with only 1/6 of its computation cost. Codes are available at https://github.com/YuchuanTian/U-DiT.

arxiv情報

著者	Yuchuan Tian,Zhijun Tu,Hanting Chen,Jie Hu,Chao Xu,Yunhe Wang
発行日	2024-10-30 16:13:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー