Fast-DataShapley: Neural Modeling for Training Data Valuation

要約

トレーニングデータの価値と著作権は、人工知能業界で重要です。
サービスプラットフォームは、データプロバイダーの正当な権利を保護し、貢献に対してかなり報いる必要があります。
貢献を評価するための強力なツールであるShapley Valueは、理論的に他の方法よりも優れていますが、その計算オーバーヘッドはデータプロバイダーの数と指数関数的にエスカレートします。
Shapley Valuesに基づく最近の作品は、近似アルゴリズムによる計算の複雑さを軽減しようとします。
ただし、テストサンプルごとに再訓練する必要があり、耐え難いコストにつながる必要があります。
Shapley値の加重最小二乗特性評価を活用して、リアルタイムの推論速度で再利用可能な説明モデルをトレーニングする1パストレーニング方法であるFast Datashapleyを提案します。
新しいテストサンプルを考えると、トレーニングデータのShapley値を計算するために再訓練は必要ありません。
さらに、2つの側面からトレーニングオーバーヘッドを削減するための理論的保証を含む3つの方法を提案します：ユーティリティ関数の近似計算とトレーニングデータのグループ計算。
時間の複雑さを分析して、方法の効率を示します。
さまざまな画像データセットの実験的評価は、ベースラインと比較して優れた性能と効率性を示しています。
具体的には、パフォーマンスは2.5倍以上に改善され、説明者のトレーニング速度は2桁増加する可能性があります。

要約(オリジナル)

The value and copyright of training data are crucial in the artificial intelligence industry. Service platforms should protect data providers’ legitimate rights and fairly reward them for their contributions. Shapley value, a potent tool for evaluating contributions, outperforms other methods in theory, but its computational overhead escalates exponentially with the number of data providers. Recent works based on Shapley values attempt to mitigate computation complexity by approximation algorithms. However, they need to retrain for each test sample, leading to intolerable costs. We propose Fast-DataShapley, a one-pass training method that leverages the weighted least squares characterization of the Shapley value to train a reusable explainer model with real-time reasoning speed. Given new test samples, no retraining is required to calculate the Shapley values of the training data. Additionally, we propose three methods with theoretical guarantees to reduce training overhead from two aspects: the approximate calculation of the utility function and the group calculation of the training data. We analyze time complexity to show the efficiency of our methods. The experimental evaluations on various image datasets demonstrate superior performance and efficiency compared to baselines. Specifically, the performance is improved to more than 2.5 times, and the explainer’s training speed can be increased by two orders of magnitude.

arxiv情報

著者	Haifeng Sun,Yu Xiong,Runze Wu,Xinyu Cai,Changjie Fan,Lan Zhang,Xiang-Yang Li
発行日	2025-06-05 17:35:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fast-DataShapley: Neural Modeling for Training Data Valuation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー