How far can we go with ImageNet for Text-to-Image generation?

要約

最近のテキストからイメージ（T2I）生成モデルは、品質よりもデータ量を優先する「より大きなISが優れている」パラダイムに続いて、10億規模のデータセットをトレーニングすることで顕著な結果を達成しました。
小規模で十分にキュレーションされたデータセットの戦略的なデータ増強が、大規模なWebスクレイプコレクションでトレーニングされたモデルと一致またはアウトパフォームすることができることを実証することにより、この確立されたパラダイムに挑戦します。
適切に設計されたテキストと画像の増強で拡張されたImagENetのみを使用して、GenevalでSD-XLで+2の総合スコアを達成し、DPGBenchで+5で+5を達成しながら、パラメーターと1/1000番目のトレーニング画像を使用します。
我々の結果は、大規模なデータセットではなく、戦略的データ増強がT2I生成にとってより持続可能なパスを提供できることを示唆しています。

要約(オリジナル)

Recent text-to-image (T2I) generation models have achieved remarkable results by training on billion-scale datasets, following a `bigger is better’ paradigm that prioritizes data quantity over quality. We challenge this established paradigm by demonstrating that strategic data augmentation of small, well-curated datasets can match or outperform models trained on massive web-scraped collections. Using only ImageNet enhanced with well-designed text and image augmentations, we achieve a +2 overall score over SD-XL on GenEval and +5 on DPGBench while using just 1/10th the parameters and 1/1000th the training images. Our results suggest that strategic data augmentation, rather than massive datasets, could offer a more sustainable path forward for T2I generation.

arxiv情報

著者	L. Degeorge,A. Ghosh,N. Dufour,D. Picard,V. Kalogeiton
発行日	2025-02-28 18:59:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

How far can we go with ImageNet for Text-to-Image generation?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー