Social Bias Probing: Fairness Benchmarking for Language Models

要約

言語モデルにおける社会的バイアスの影響は認識されていますが、バイアスを評価するためのこれまでの方法は、小さなデータセットに対する二項関連テストに限定されており、バイアスの複雑さについての理解は限られていました。
この論文は、異質な扱いを評価することによって社会的バイアスに関する言語モデルを調査するための新しいフレームワークを提案します。これには、敏感な人口統計上のグループに属しているかどうかに応じて個人を異なるように扱うことが含まれます。
私たちは、既存の公平性コレクションの制限に対処するために設計された大規模ベンチマークである SoFa を厳選しています。
SoFa は、定型的アイデンティティと反定型的アイデンティティの二項比較を超えて、さまざまな範囲のアイデンティティと定型を含めて分析を拡張します。
私たちの方法論を既存のベンチマークと比較すると、言語モデル内のバイアスは認識されているよりも微妙であり、これまで認識されていたよりもエンコードされたバイアスの範囲が広いことがわかります。
SoFa で LM をベンチマークすることで、さまざまな宗教を表現するアイデンティティがどのようにすべてのモデルにわたって最も顕著な異質な扱いにつながるかを明らかにします。
最後に、私たちの調査結果は、女性や障害者などのさまざまなグループが直面する現実の逆境が、これらのモデルの行動に反映されていることを示しています。

要約(オリジナル)

While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. SoFa expands the analysis beyond the binary comparison of stereotypical versus anti-stereotypical identities to include a diverse range of identities and stereotypes. Comparing our methodology with existing benchmarks, we reveal that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized. Benchmarking LMs on SoFa, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models. Finally, our findings indicate that real-life adversities faced by various groups such as women and people with disabilities are mirrored in the behavior of these models.

arxiv情報

著者	Marta Marchiori Manerba,Karolina Stańczak,Riccardo Guidotti,Isabelle Augenstein
発行日	2024-10-07 16:01:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Social Bias Probing: Fairness Benchmarking for Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー