DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

要約

拡散モデルは、さまざまな画像生成タスクで顕著な成功を示していますが、さまざまな条件とノイズレベルにわたる入力の均一な処理によって、そのパフォーマンスはしばしば制限されます。
この制限に対処するために、拡散プロセスの固有の不均一性を活用する新しいアプローチを提案します。
私たちの方法であるdiffmoeは、専門家がトレーニング中にグローバルトークン分布にアクセスできるようにするバッチレベルのグローバルトークンプールを導入し、専門の専門家行動を促進します。
拡散プロセスの可能性を最大限に引き出すために、DiffMOEには、ノイズレベルとサンプルの複雑さに基づいて計算リソースを動的に割り当てる容量予測因子が組み込まれています。
包括的な評価を通じて、DIFFMOEはイメージネットベンチマーク上の拡散モデル間で最先端のパフォーマンスを達成し、3倍のアクティブ化されたパラメーターと既存のMOEアプローチを使用して、1倍のアクティブ化されたパラメーターを維持しながら、両方の密なアーキテクチャを大幅に上回ります。
私たちのアプローチの有効性は、階級条件の生成を超えて、テキストから画像の生成などのより挑戦的なタスクにまで及び、異なる拡散モデルアプリケーションにわたって幅広い適用性を示しています。
プロジェクトページ：https：//shiml20.github.io/diffmoe/

要約(オリジナル)

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels. To address this limitation, we propose a novel approach that leverages the inherent heterogeneity of the diffusion process. Our method, DiffMoE, introduces a batch-level global token pool that enables experts to access global token distributions during training, promoting specialized expert behavior. To unleash the full potential of the diffusion process, DiffMoE incorporates a capacity predictor that dynamically allocates computational resources based on noise levels and sample complexity. Through comprehensive evaluation, DiffMoE achieves state-of-the-art performance among diffusion models on ImageNet benchmark, substantially outperforming both dense architectures with 3x activated parameters and existing MoE approaches while maintaining 1x activated parameters. The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation, demonstrating its broad applicability across different diffusion model applications. Project Page: https://shiml20.github.io/DiffMoE/

arxiv情報

著者	Minglei Shi,Ziyang Yuan,Haotian Yang,Xintao Wang,Mingwu Zheng,Xin Tao,Wenliang Zhao,Wenzhao Zheng,Jie Zhou,Jiwen Lu,Pengfei Wan,Di Zhang,Kun Gai
発行日	2025-03-18 17:57:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー