Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

要約

データセット蒸留（DD）は、限られた量のメモリと計算で深いネットワークを効率的にトレーニングできる小さな合成データセットを生成します。
監視された学習のためのDDメソッドの成功にもかかわらず、ディープモデルの自己監視前のトレーニングのDDは依然として依然としてありません。
ラベル付けされていないデータの事前トレーニングは、限られたラベル付きデータを使用してダウンストリームタスクに効率的に一般化するために重要です。
この作業では、SSLプリトレーニングのための最初の効果的なDDメソッドを提案します。
まず、理論的および経験的に、SSL勾配の高い分散により、SSLへの監視されたDDメソッドの素朴な適用が失敗することを示します。
次に、知識蒸留（KD）の文献からの洞察に依存することにより、この問題に対処します。
具体的には、SSLで訓練されたより大きな教師モデルの表現に一致するように、小さな学生モデルを訓練します。
次に、学生モデルのトレーニング軌跡を一致させることにより、小さな合成データセットを生成します。
KD目的はSSLよりもかなり低い分散を持っているため、私たちのアプローチは、高品質のエンコーダーを事前に走行できる合成データセットを生成できます。
広範な実験を通じて、蒸留セットが、限られたラベル付きデータの存在下で、さまざまな下流タスクで、以前の作業よりも最大13％高い精度につながることを示しています。
https://github.com/bigml-cs-ucla/mkdtのコード。

要約(オリジナル)

Dataset distillation (DD) generates small synthetic datasets that can efficiently train deep networks with a limited amount of memory and compute. Despite the success of DD methods for supervised learning, DD for self-supervised pre-training of deep models has remained unaddressed. Pre-training on unlabeled data is crucial for efficiently generalizing to downstream tasks with limited labeled data. In this work, we propose the first effective DD method for SSL pre-training. First, we show, theoretically and empirically, that naive application of supervised DD methods to SSL fails, due to the high variance of the SSL gradient. Then, we address this issue by relying on insights from knowledge distillation (KD) literature. Specifically, we train a small student model to match the representations of a larger teacher model trained with SSL. Then, we generate a small synthetic dataset by matching the training trajectories of the student models. As the KD objective has considerably lower variance than SSL, our approach can generate synthetic datasets that can successfully pre-train high-quality encoders. Through extensive experiments, we show that our distilled sets lead to up to 13% higher accuracy than prior work, on a variety of downstream tasks, in the presence of limited labeled data. Code at https://github.com/BigML-CS-UCLA/MKDT.

arxiv情報

著者	Siddharth Joshi,Jiayi Ni,Baharan Mirzasoleiman
発行日	2025-02-19 18:39:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー