Machine Unlearning Fails to Remove Data Poisoning Attacks

要約

大規模な深い学習のために開発されたおおよそのマシンの概算のためのいくつかの実用的な方法の有効性を再訪します。
データの削除要求に準拠することに加えて、学習方法を解除するための潜在的なアプリケーションの1つは、中毒データの効果を削除することです。
既存の未解決の方法は多くの設定で効果的であることが実証されているが、さまざまな種類の中毒攻撃（無差別、標的、および新しく導入されたガウス中毒攻撃）およびモデル（画像分類子およびLLM）にわたるデータ中毒の影響を除去できないことを実験的に実証します。
比較的大きな計算予算が付与された場合でも。
未学習の有効性を正確に特徴付けるために、データ中毒に基づいて学習するための新しい評価メトリックを導入します。
私たちの結果は、幅広い評価を含むより広範な視点が、証明可能な保証なしに深い学習のための機械の学習手順に対する誤った自信を避けるために必要であることを示唆しています。
さらに、学習方法の未学習は、再訓練することなく有毒なデータを効率的に除去するのに役立つ兆候を示していますが、私たちの研究は、これらの方法がまだ「プライムタイムの準備ができていない」ことを示唆しており、現在再訓練に限られた利益を提供しています。

要約(オリジナル)

We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, are required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned data without having to retrain, our work suggests that these methods are not yet “ready for prime time,” and currently provide limited benefit over retraining.

arxiv情報

著者	Martin Pawelczyk,Jimmy Z. Di,Yiwei Lu,Ayush Sekhari,Gautam Kamath,Seth Neel
発行日	2025-04-01 10:49:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Machine Unlearning Fails to Remove Data Poisoning Attacks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー