Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation

要約

大規模言語モデル (LLM) は、偏った有害な応答を生成する可能性があります。
しかし、LLM のジェンダーバイアス評価に関するこれまでの研究のほとんどは、事前に定義されたジェンダー関連のフレーズやジェンダーステレオタイプを必要とし、それらを包括的に収集するのは困難であり、明示的なバイアス評価に限定されています。
さらに、入力にジェンダー関連の言語や明示的なステレオタイプが含まれていないインスタンスでも、依然として LLM にジェンダーバイアスを引き起こす可能性があると私たちは考えています。
したがって、この研究では、事前定義された性別フレーズやステレオタイプを必要としない条件付きテキスト生成メカニズムを提案します。
このアプローチでは、LLM を調査するための 3 つの異なる戦略を通じて生成された 3 種類の入力を使用し、LLM における明示的および暗黙的なジェンダーバイアスの証拠を示すことを目的としています。
また、明示的および暗黙的な評価指標を利用して、さまざまな戦略の下で LLM のジェンダーバイアスを評価します。
私たちの実験は、モデルサイズの増加が一貫して公平性の向上につながるわけではなく、入力に明示的な性別ステレオタイプが存在しない場合でも、テストされたすべての LLM が明示的および/または暗黙的な性別バイアスを示すことを示しています。

要約(オリジナル)

Large Language Models (LLMs) can generate biased and toxic responses. Yet most prior work on LLM gender bias evaluation requires predefined gender-related phrases or gender stereotypes, which are challenging to be comprehensively collected and are limited to explicit bias evaluation. In addition, we believe that instances devoid of gender-related language or explicit stereotypes in inputs can still induce gender bias in LLMs. Thus, in this work, we propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes. This approach employs three types of inputs generated through three distinct strategies to probe LLMs, aiming to show evidence of explicit and implicit gender biases in LLMs. We also utilize explicit and implicit evaluation metrics to evaluate gender bias in LLMs under different strategies. Our experiments demonstrate that an increased model size does not consistently lead to enhanced fairness and all tested LLMs exhibit explicit and/or implicit gender bias, even when explicit gender stereotypes are absent in the inputs.

arxiv情報

著者	Xiangjue Dong,Yibo Wang,Philip S. Yu,James Caverlee
発行日	2023-11-01 05:31:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー