Towards Mitigating Architecture Overfitting in Dataset Distillation

要約

データセット蒸留法は、非常に限られたトレーニングデータでトレーニングされたニューラルネットワークに対して顕著なパフォーマンスを実証しました。
ただし、アーキテクチャのオーバーフィッティングという形で重大な課題が生じます。特定のネットワークアーキテクチャ (つまり、トレーニングネットワーク) によって合成された抽出されたトレーニングデータは、他のネットワークアーキテクチャ (つまり、テストネットワーク) によってトレーニングされると、パフォーマンスが低下します。
このホワイトペーパーでは、この問題に対処し、抽出されたトレーニングデータに対するさまざまなネットワークアーキテクチャ間で汎化パフォーマンスを向上させるために一緒に採用できる、アーキテクチャ設計とトレーニングスキームの両方における一連のアプローチを提案します。
私たちは、手法の有効性と一般性を実証するために広範な実験を実施します。
特に、異なるサイズの蒸留データを含むさまざまなシナリオにわたって、大容量のネットワークを使用して蒸留データをトレーニングする場合、私たちのアプローチは既存の方法と同等またはそれ以上のパフォーマンスを達成します。

要約(オリジナル)

Dataset distillation methods have demonstrated remarkable performance for neural networks trained with very limited training data. However, a significant challenge arises in the form of architecture overfitting: the distilled training data synthesized by a specific network architecture (i.e., training network) generates poor performance when trained by other network architectures (i.e., test networks). This paper addresses this issue and proposes a series of approaches in both architecture designs and training schemes which can be adopted together to boost the generalization performance across different network architectures on the distilled training data. We conduct extensive experiments to demonstrate the effectiveness and generality of our methods. Particularly, across various scenarios involving different sizes of distilled data, our approaches achieve comparable or superior performance to existing methods when training on the distilled data using networks with larger capacities.

arxiv情報

著者	Xuyang Zhong,Chen Liu
発行日	2023-09-08 08:12:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Mitigating Architecture Overfitting in Dataset Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー