Understanding Unfairness via Training Concept Influence

要約

モデルの不公平性の原因を知ることは、実践者がデータとアルゴリズムをより深く理解するのに役立ちます。
これは重要ですが、比較的未調査のタスクです。
私たちは、不公平の主な原因の 1 つであるトレーニングデータというレンズを通して、この問題を調査します。
私たちは次のような質問をします: トレーニングデータ内で、一部のサンプル (1) が異なる (人口統計など) グループから収集された場合、(2) 異なるラベルが付けられた場合、または (3) 一部の特徴が異なる場合、モデルの公平性パフォーマンスはどのように変化するでしょうか。
かわった？
言い換えれば、事前定義された概念、つまり、特徴 (X)、ラベル (Y)、または機密属性 (A) などのデータ属性に基づいて、事実に反してサンプルを介入および変更することによって、トレーニングサンプルの公平性の影響を定量化します。
コンセプトに関するモデルの不公平性に対するトレーニングサンプルの影響を計算するには、まずコンセプトに基づいて反事実サンプル、つまりコンセプトが変更された場合のサンプルの反事実バージョンを生成します。
次に、反事実サンプルがトレーニングで使用された場合に、影響関数を介して不公平性に対する結果として生じる影響を計算します。
私たちのフレームワークは、実践者が観察された不公平性を理解し、トレーニングデータを修復するのに役立つだけでなく、他の多くの応用にもつながります。
不正ラベルの検出、不均衡な表現の修正、公平性を目的としたポイズニング攻撃の検出などです。

要約(オリジナル)

Knowing the causes of a model’s unfairness helps practitioners better understand their data and algorithms. This is an important yet relatively unexplored task. We look into this problem through the lens of the training data – one of the major sources of unfairness. We ask the following questions: how would a model’s fairness performance change if, in its training data, some samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) some features were changed? In other words, we quantify the fairness influence of training samples by counterfactually intervening and changing samples based on predefined concepts, i.e. data attributes such as features (X), labels (Y), or sensitive attributes (A). To calculate a training sample’s influence on the model’s unfairness w.r.t a concept, we first generate counterfactual samples based on the concept, i.e. the counterfactual versions of the sample if the concept were changed. We then calculate the resulting impact on the unfairness, via influence function, if the counterfactual samples were used in training. Our framework not only helps practitioners understand the observed unfairness and repair their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.

arxiv情報

著者	Yuanshun Yao,Yang Liu
発行日	2023-06-30 17:48:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Understanding Unfairness via Training Concept Influence

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー