On the Robustness of Adversarial Training Against Uncertainty Attacks

要約

学習問題では、手元のタスクに固有のノイズは、ある程度の不確実性なしに推測する可能性を妨げます。
この不確実性を定量化することは、その幅広い用途に関係なく、セキュリティに敏感なアプリケーションに高い関連性を想定しています。
これらのシナリオ内では、下流のモジュールが最終的な意思決定プロセスを推進するために安全に採用できる、良好な（つまり、信頼できる）不確実性測定を保証することが基本になります。
ただし、攻撃者は、システムに（i）システムの可用性を危険にさらす非常に不確実な出力または（ii）不確実性の推定値を生成することに関心がある場合があります。
したがって、これらの種類の攻撃に対して堅牢な不確実性の推定値を取得する方法を理解することが基本になります。
この作業では、経験的および理論的には、敵対的な例を防御すること、すなわち誤分類を引き起こす慎重に乱れたサンプルの両方が、アドホック防衛戦略を必要とせずに一般的な攻撃シナリオの下で、より安全で信頼できる不確実性の推定をさらに保証することを明らかにします。
私たちの主張をサポートするために、CIFAR-10およびImagenetデータセットで公開されているベンチマークRobustBenchから複数の敵対的なロボストモデルを評価します。

要約(オリジナル)

In learning problems, the noise inherent to the task at hand hinders the possibility to infer without a certain degree of uncertainty. Quantifying this uncertainty, regardless of its wide use, assumes high relevance for security-sensitive applications. Within these scenarios, it becomes fundamental to guarantee good (i.e., trustworthy) uncertainty measures, which downstream modules can securely employ to drive the final decision-making process. However, an attacker may be interested in forcing the system to produce either (i) highly uncertain outputs jeopardizing the system’s availability or (ii) low uncertainty estimates, making the system accept uncertain samples that would instead require a careful inspection (e.g., human intervention). Therefore, it becomes fundamental to understand how to obtain robust uncertainty estimates against these kinds of attacks. In this work, we reveal both empirically and theoretically that defending against adversarial examples, i.e., carefully perturbed samples that cause misclassification, additionally guarantees a more secure, trustworthy uncertainty estimate under common attack scenarios without the need for an ad-hoc defense strategy. To support our claims, we evaluate multiple adversarial-robust models from the publicly available benchmark RobustBench on the CIFAR-10 and ImageNet datasets.

arxiv情報

著者	Emanuele Ledda,Giovanni Scodeller,Daniele Angioni,Giorgio Piras,Antonio Emanuele Cinà,Giorgio Fumera,Battista Biggio,Fabio Roli
発行日	2025-05-27 17:41:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On the Robustness of Adversarial Training Against Uncertainty Attacks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー