FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training

要約

ディープニューラルネットワークは、敵対的な攻撃や一般的な破損の影響を受けやすく、その堅牢性が損なわれます。
このような課題に対するモデルの回復力を強化するために、敵対的トレーニング (AT) が有力なソリューションとして浮上しています。
それにもかかわらず、敵対的な堅牢性は、AT 中のモデルの公平性、つまりモデルのクラスごとの堅牢性の不均衡を犠牲にして達成されることがよくあります。
特徴的なクラスはそのような敵対者に対してより堅牢になりますが、検出が困難なクラスは影響を受けます。
最近の研究では、特に摂動画像のモデルの公平性を向上させることに重点が置かれており、摂動されていない可能性が最も高いデータの精度が見落とされています。
さらに、モデルのトレーニング中に遭遇する敵対者に対する堅牢性にも関わらず、最先端の敵対的トレーニング済みモデルは、多様な敵対的脅威や一般的な破損に直面した場合に堅牢性と公平性を維持することが困難です。
この研究では、Fair Targeted Adversarial Training (FAIR-TAT) と呼ばれる新しいアプローチを導入することで、上記の懸念に対処します。
我々は、敵対的トレーニングに（ターゲットを絞らない攻撃の代わりに）ターゲットを絞った敵対的攻撃を使用すると、敵対的公平性に関してより有利なトレードオフが可能になることを示します。
経験的な結果により、私たちのアプローチの有効性が検証されています。

要約(オリジナル)

Deep neural networks are susceptible to adversarial attacks and common corruptions, which undermine their robustness. In order to enhance model resilience against such challenges, Adversarial Training (AT) has emerged as a prominent solution. Nevertheless, adversarial robustness is often attained at the expense of model fairness during AT, i.e., disparity in class-wise robustness of the model. While distinctive classes become more robust towards such adversaries, hard to detect classes suffer. Recently, research has focused on improving model fairness specifically for perturbed images, overlooking the accuracy of the most likely non-perturbed data. Additionally, despite their robustness against the adversaries encountered during model training, state-of-the-art adversarial trained models have difficulty maintaining robustness and fairness when confronted with diverse adversarial threats or common corruptions. In this work, we address the above concerns by introducing a novel approach called Fair Targeted Adversarial Training (FAIR-TAT). We show that using targeted adversarial attacks for adversarial training (instead of untargeted attacks) can allow for more favorable trade-offs with respect to adversarial fairness. Empirical results validate the efficacy of our approach.

arxiv情報

著者	Tejaswini Medi,Steffen Jung,Margret Keuper
発行日	2024-10-30 15:58:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー