Certified Robustness to Data Poisoning in Gradient-Based Training

要約

最新の機械学習パイプラインは大量の公開データを活用しているため、データの品質を保証することが不可能であり、モデルはポイズニングやバックドア攻撃にさらされています。
このような攻撃下でのモデルの動作に限界があることは明らかであり、依然として未解決の問題です。
この研究では、モデルや学習アルゴリズムを変更せずに、操作された可能性のあるデータを使用してトレーニングされたモデルの動作について証明可能な保証を提供する最初のフレームワークを開発することで、この課題に対処します。
特に、私たちのフレームワークは、トレーニング入力とラベルの制限付きおよび制限なしの操作に対する、非対象および対象を絞ったポイズニング、およびバックドア攻撃に対する堅牢性を証明します。
私たちの方法では、凸緩和を利用して、特定のポイズニング脅威モデルに対して可能なすべてのパラメーター更新のセットを過近似することで、任意の勾配ベースの学習アルゴリズムに対して到達可能なすべてのパラメーターのセットを制限することができます。
この一連のパラメーターを考慮して、モデルのパフォーマンスやバックドアの成功率など、最悪の場合の動作の制限を提供します。
エネルギー消費、医療画像処理、自動運転などのアプリケーションからの複数の実世界データセットに対するアプローチを実証します。

要約(オリジナル)

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. Provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge by developing the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data without modifying the model or learning algorithm. In particular, our framework certifies robustness against untargeted and targeted poisoning, as well as backdoor attacks, for bounded and unbounded manipulations of the training inputs and labels. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

arxiv情報

著者	Philip Sosnin,Mark N. Müller,Maximilian Baader,Calvin Tsay,Matthew Wicker
発行日	2024-10-30 17:47:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Certified Robustness to Data Poisoning in Gradient-Based Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー