Towards Trustworthy Dataset Distillation

要約

深層学習を現実世界のアプリケーションに適用する場合、効率と信頼性は永遠の追求です。
効率に関しては、データセット蒸留 (DD) は、大規模なデータセットを小さな合成データセットに蒸留することでトレーニングコストを削減しようとします。
しかし、既存の手法は、閉じられた世界の設定における分布内 (InD) 分類にのみ焦点を当てており、分布外 (OOD) サンプルは無視されています。
一方、OOD 検出はモデルの信頼性を高めることを目的としていますが、フルデータ設定では常に非効率的に達成されます。
初めて、両方の問題を同時に検討し、Trustworthy Dataset Distillation (TrustDD) と呼ばれる新しいパラダイムを提案します。
InD サンプルと外れ値の両方を抽出することにより、凝縮されたデータセットは、InD 分類と OOD 検出の両方に適したモデルをトレーニングできます。
実際の外れ値データの要件を緩和し、OOD 検出をより実用的にするために、InD サンプルを破損して疑似外れ値を生成し、疑似外れ値露出 (POE) を導入することをさらに提案します。
さまざまな設定での包括的な実験により、TrustDD の有効性が実証され、提案された POE は最先端の手法である Outlier Exposure (OE) を上回っています。
前述の DD と比較して、TrustDD はより信頼性が高く、実際のオープンワールドシナリオに適用できます。
私たちのコードは公開される予定です。

要約(オリジナル)

Efficiency and trustworthiness are two eternal pursuits when applying deep learning in real-world applications. With regard to efficiency, dataset distillation (DD) endeavors to reduce training costs by distilling the large dataset into a tiny synthetic dataset. However, existing methods merely concentrate on in-distribution (InD) classification in a closed-world setting, disregarding out-of-distribution (OOD) samples. On the other hand, OOD detection aims to enhance models’ trustworthiness, which is always inefficiently achieved in full-data settings. For the first time, we simultaneously consider both issues and propose a novel paradigm called Trustworthy Dataset Distillation (TrustDD). By distilling both InD samples and outliers, the condensed datasets are capable to train models competent in both InD classification and OOD detection. To alleviate the requirement of real outlier data and make OOD detection more practical, we further propose to corrupt InD samples to generate pseudo-outliers and introduce Pseudo-Outlier Exposure (POE). Comprehensive experiments on various settings demonstrate the effectiveness of TrustDD, and the proposed POE surpasses state-of-the-art method Outlier Exposure (OE). Compared with the preceding DD, TrustDD is more trustworthy and applicable to real open-world scenarios. Our code will be publicly available.

arxiv情報

著者	Shijie Ma,Fei Zhu,Zhen Cheng,Xu-Yao Zhang
発行日	2023-07-18 11:43:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Trustworthy Dataset Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー