Do Counterfactual Examples Complicate Adversarial Training?

要約

私たちは拡散モデルを活用して、ロバストな分類器のロバスト性とパフォーマンスのトレードオフを研究します。
私たちのアプローチは、低規範の反事実例 (CE)、つまり異なる真のクラスメンバーシップをもたらす意味的に変更されたデータを生成するための、単純な事前トレーニング済みの拡散手法を導入します。
私たちは、クリーンなトレーニングデータに対するロバストモデルの信頼性と精度が、CE へのデータの近さと関連していることを報告します。
さらに、堅牢なモデルは、CE によってもたらされる低規範の意味論的変更に対してますます不変になるため、CE で直接評価するとパフォーマンスが非常に低くなります。
結果は、非ロバストな特徴とセマンティックな特徴の間に重要な重複があることを示しており、非ロバストな特徴は解釈できないという一般的な仮定に反しています。

要約(オリジナル)

We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.

arxiv情報

著者	Eric Yeats,Cameron Darwin,Eduardo Ortega,Frank Liu,Hai Li
発行日	2024-04-16 14:13:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do Counterfactual Examples Complicate Adversarial Training?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー