DREAM: Efficient Dataset Distillation by Representative Matching

要約

データセットの蒸留は、ストレージとトレーニングのコストを削減するために、元の大規模データセットから情報の損失がほとんどない小さなデータセットを合成することを目的としています。
最近の最先端の方法は、主に、勾配、埋め込み分布、またはトレーニング軌道に関して合成画像と元の画像を一致させることにより、サンプル合成プロセスを制約します。
さまざまなマッチングの目的がありますが、現在、元の画像を選択するための戦略は単純なランダムサンプリングに限定されています。
ランダムサンプリングは、選択されたサンプル分布の均一性を見落としていると主張します。
さらに、サンプルの多様性は、ランダムサンプリングによっても制約されません。
これらの要因が一緒になって、蒸留プロセスの最適化が不安定になり、トレーニング効率が低下します。
したがって、代表的な元の画像のみがマッチングのために選択される、\textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM) による \textbf{D}ataset 蒸留と名付けられた新しいマッチング戦略を提案します。
DREAM は、一般的なデータセット蒸留フレームワークに簡単にプラグインでき、パフォーマンスを低下させることなく、蒸留の繰り返しを 8 倍以上削減できます。
十分なトレーニング時間があれば、DREAM はさらに大幅な改善を提供し、最先端のパフォーマンスを実現します。

要約(オリジナル)

Dataset distillation aims to synthesize small datasets with little information loss from original large-scale ones for reducing storage and training costs. Recent state-of-the-art methods mainly constrain the sample synthesis process by matching synthetic images and the original ones regarding gradients, embedding distributions, or training trajectories. Although there are various matching objectives, currently the strategy for selecting original images is limited to naive random sampling. We argue that random sampling overlooks the evenness of the selected sample distribution, which may result in noisy or biased matching targets. Besides, the sample diversity is also not constrained by random sampling. These factors together lead to optimization instability in the distilling process and degrade the training efficiency. Accordingly, we propose a novel matching strategy named as \textbf{D}ataset distillation by \textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM), where only representative original images are selected for matching. DREAM is able to be easily plugged into popular dataset distillation frameworks and reduce the distilling iterations by more than 8 times without performance drop. Given sufficient training time, DREAM further provides significant improvements and achieves state-of-the-art performances.

arxiv情報

著者	Yanqing Liu,Jianyang Gu,Kai Wang,Zheng Zhu,Wei Jiang,Yang You
発行日	2023-03-09 15:53:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DREAM: Efficient Dataset Distillation by Representative Matching

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー