Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads

要約

BERT や GPT などのトランスフォーマーベースの事前トレーニング済み大規模言語モデル (PLM) は、NLP タスクで目覚ましい成功を収めています。
ただし、PLM は固定観念をエンコードする傾向があります。
ジェンダーや人種の固定観念を緩和する研究など、PLM における固定観念の軽減に関する文献が急増していますが、そのような偏見が PLM 内でどのように現れ、内部でどのように動作するかは、ほとんど不明のままです。
内部のステレオタイプ化メカニズムを理解すると、モデルの公平性をより適切に評価できるようになり、効果的な緩和戦略の開発に役立つ可能性があります。
この研究では、Transformer アーキテクチャの主要コンポーネントであるアテンションヘッドに焦点を当て、PLM のステレオタイプバイアスに寄与していることが判明した少数のバイアスヘッドを調査および特定するためのバイアス分析フレームワークを提案します。
私たちは、これらの偏ったヘッドの存在を検証し、それらがどのように動作するかをよりよく理解するために広範な実験を実施します。
私たちは、エンコーダーベースの BERT モデルとデコーダーベースの自己回帰 GPT モデルという 2 種類の Transformer ベースの PLM で、英語における性別と人種の偏見を調査しました。
全体として、結果は、事前トレーニングされた言語モデルにおけるバイアス動作の理解に光を当てました。

要約(オリジナル)

Transformer-based pretrained large language models (PLM) such as BERT and GPT have achieved remarkable success in NLP tasks. However, PLMs are prone to encoding stereotypical biases. Although a burgeoning literature has emerged on stereotypical bias mitigation in PLMs, such as work on debiasing gender and racial stereotyping, how such biases manifest and behave internally within PLMs remains largely unknown. Understanding the internal stereotyping mechanisms may allow better assessment of model fairness and guide the development of effective mitigation strategies. In this work, we focus on attention heads, a major component of the Transformer architecture, and propose a bias analysis framework to explore and identify a small set of biased heads that are found to contribute to a PLM’s stereotypical bias. We conduct extensive experiments to validate the existence of these biased heads and to better understand how they behave. We investigate gender and racial bias in the English language in two types of Transformer-based PLMs: the encoder-based BERT model and the decoder-based autoregressive GPT model. Overall, the results shed light on understanding the bias behavior in pretrained language models.

arxiv情報

著者	Yi Yang,Hanyu Duan,Ahmed Abbasi,John P. Lalor,Kar Yan Tam
発行日	2023-11-17 08:56:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー