The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

要約

テキストから画像への生成モデルを使用すると、制御可能な方法で無制限の量の画像を合成できるため、合成データを使用してビジョンモデルをトレーニングするという最近の多くの取り組みが促進されています。
ただし、すべての合成画像は最終的に、ジェネレーターのトレーニングに使用された上流のデータから生成されます。
中間ジェネレーターは、上流データの関連部分を直接トレーニングするよりも追加の情報を提供しますか?
この疑問を画像分類の設定に基づいて、安定拡散（LAION-2B データセットでトレーニングされた生成モデル）によって生成された、タスクに関連したターゲットを絞った合成データの微調整と、LAION から直接取得したターゲットを絞った実際の画像の微調整を比較します。
2B.
合成データは一部の下流タスクに利益をもたらす可能性がある一方で、単純な検索ベースラインからの実際のデータと普遍的に一致するか、それを上回るパフォーマンスを示すことを示します。
私たちの分析では、このパフォーマンス不足の原因の一部は、ジェネレーターのアーティファクトと、合成画像内のタスク関連の視覚的詳細が不正確であることが示唆されています。
全体として、ターゲットを絞った検索は、合成データを使用してトレーニングするときに考慮すべき重要なベースラインであり、現在の手法ではまだ超えられていないベースラインであると私たちは主張します。
コード、データ、モデルは https://github.com/scottgeng00/unmet-promise でリリースされます。

要約(オリジナル)

Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. Does the intermediate generator provide additional information over directly training on relevant parts of the upstream data? Grounding this question in the setting of image classification, we compare finetuning on task-relevant, targeted synthetic data generated by Stable Diffusion — a generative model trained on the LAION-2B dataset — against finetuning on targeted real images retrieved directly from LAION-2B. We show that while synthetic data can benefit some downstream tasks, it is universally matched or outperformed by real data from the simple retrieval baseline. Our analysis suggests that this underperformance is partially due to generator artifacts and inaccurate task-relevant visual details in the synthetic images. Overall, we argue that targeted retrieval is a critical baseline to consider when training with synthetic data — a baseline that current methods do not yet surpass. We release code, data, and models at https://github.com/scottgeng00/unmet-promise.

arxiv情報

著者	Scott Geng,Cheng-Yu Hsieh,Vivek Ramanujan,Matthew Wallingford,Chun-Liang Li,Pang Wei Koh,Ranjay Krishna
発行日	2025-01-02 00:01:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー