DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

要約

タイトル：DiffFit：シンプルなパラメータ効率の高いファインチューニングによる大規模拡散モデルの転移性の開放

要約：

– 大規模拡散モデルを新しい領域に適用することは、現実世界のアプリケーションにとって重要な問題であり、解決されていない。
– この論文は、新しい領域への迅速な適応を可能にする大規模事前学習済み拡散モデルのパラメータ効率の高いファインチューニング戦略であるDiffFitを提案している。
– DiffFitは、バイアス項と特定の層で新たに追加されたスケーリング係数のみをファインチューニングする非常にシンプルな手法であり、トレーニング速度の加速とモデル格納コストの削減につながる。
– 完全なファインチューニングに比べ、DiffFitは2倍のトレーニング速度を実現し、総モデルパラメータの約0.12％しか必要としない。
– 直感的な理論分析を提供し、スケーリング係数の効果について説明している。
– DiffFitは、公開されたImageNet 256×256のモデルからわずか25エポックを使用してトレーニングされたImageNet 512×512のモデルに低コストで適応することができる。
– DiffFitは、ImageNet 512×512ベンチマークにおいて、最も近い競合者よりも30倍トレーニング効率が高く、新しい州のFID 3.02を達成する。また、8つのDownstreamデータセットでも、DiffFitは完全なファインチューニングに比べて優れた性能を発揮する。拡散に基づく方法の中でも、DiffFitは新しい最高記録を樹立している。

要約(オリジナル)

Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12\% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

arxiv情報

著者	Enze Xie,Lewei Yao,Han Shi,Zhili Liu,Daquan Zhou,Zhaoqiang Liu,Jiawei Li,Zhenguo Li
発行日	2023-05-04 02:55:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー