Are fairness metric scores enough to assess discrimination biases in machine learning?

要約

この論文では、テキストデータに対する機械学習アルゴリズムによって行われる、性差別のバイアスを評価するための現在の測定基準の欠点を明らかにする新しい実験を紹介します。
私たちは Bios データセットに焦点を当てており、学習タスクは個人の経歴に基づいて個人の職業を予測することです。
このような予測タスクは、自動ジョブ推奨などの商用の自然言語処理 (NLP) アプリケーションで一般的です。
私たちは、グループごとの公平性メトリクスを扱う理論的議論の重要な制限に対処します。多くの産業用 NLP アプリケーションでは、小規模から適度に大きな言語データセットを使用するのが標準ですが、主な実際的な制約は、適切なデータセットを取得することです。
予測精度。
次に、訓練セットのサイズが合理的に正確な予測を学習するのに十分である場合、バイアスのさまざまな一般的な尺度がどの程度信頼できるかについて疑問を抱きます。
私たちの実験では Bios データセットをサンプリングし、さまざまなサンプルサイズで 200 以上のモデルを学習しました。
これにより、結果を統計的に研究し、一般的な性別バイアス指数が比較的小規模なトレーニングサンプルやテストサンプルに適用された場合に、ばらつきがあり、場合によっては信頼性の低い結果が得られることを確認できます。
これは、この分野で適切な結果を提供するために分散計算が非常に重要であることを強調しています。

要約(オリジナル)

This paper presents novel experiments shedding light on the shortcomings of current metrics for assessing biases of gender discrimination made by machine learning algorithms on textual data. We focus on the Bios dataset, and our learning task is to predict the occupation of individuals, based on their biography. Such prediction tasks are common in commercial Natural Language Processing (NLP) applications such as automatic job recommendations. We address an important limitation of theoretical discussions dealing with group-wise fairness metrics: they focus on large datasets, although the norm in many industrial NLP applications is to use small to reasonably large linguistic datasets for which the main practical constraint is to get a good prediction accuracy. We then question how reliable are different popular measures of bias when the size of the training set is simply sufficient to learn reasonably accurate predictions. Our experiments sample the Bios dataset and learn more than 200 models on different sample sizes. This allows us to statistically study our results and to confirm that common gender bias indices provide diverging and sometimes unreliable results when applied to relatively small training and test samples. This highlights the crucial importance of variance calculations for providing sound results in this field.

arxiv情報

著者	Fanny Jourdan,Laurent Risser,Jean-Michel Loubes,Nicholas Asher
発行日	2023-06-08 15:56:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Are fairness metric scores enough to assess discrimination biases in machine learning?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー