First-Person Fairness in Chatbots

要約

チャットボットの急速な普及を考えると、チャットボットの公平性を評価することは極めて重要であるが、典型的なチャットボットのタスク（履歴書作成、エンターテイメントなど）は、従来アルゴリズムの公平性を議論する上で中心的であった制度的な意思決定タスク（履歴書スクリーニングなど）と乖離している。チャットボットのオープンエンドな性質と多様なユースケースは、バイアス評価のための新しい方法を必要とする。本稿では、「一人称の公平性」（人口統計学的特徴に基づくチャットボットユーザーに対する公平性）を評価するためのスケーラブルな反事実的アプローチを導入することで、これらの課題に対処する。私たちの方法は、有害なステレオタイプの定量的測定とチャットボットの応答における人口統計学的差異の定性的分析を得るために、リサーチアシスタントとしての言語モデル（LMRA）を採用しています。我々はこのアプローチを適用し、9つのドメインにおける66のタスクをカバーし、2つの性別と4つの人種にまたがる、数百万の対話にわたる6つの言語モデルのバイアスを評価する。独立した人間によるアノテーションにより、LMRAが生成したバイアスの評価が裏付けられた。本研究は、実世界のチャットデータに基づく初の大規模な公平性評価である。学習後の強化学習技術により、これらのバイアスが大幅に軽減されることを強調している。この評価は、継続的なバイアスの監視と緩和のための実用的な方法論を提供する。

要約(オリジナル)

Evaluating chatbot fairness is crucial given their rapid proliferation, yet typical chatbot tasks (e.g., resume writing, entertainment) diverge from the institutional decision-making tasks (e.g., resume screening) which have traditionally been central to discussion of algorithmic fairness. The open-ended nature and diverse use-cases of chatbots necessitate novel methods for bias assessment. This paper addresses these challenges by introducing a scalable counterfactual approach to evaluate ‘first-person fairness,’ meaning fairness toward chatbot users based on demographic characteristics. Our method employs a Language Model as a Research Assistant (LMRA) to yield quantitative measures of harmful stereotypes and qualitative analyses of demographic differences in chatbot responses. We apply this approach to assess biases in six of our language models across millions of interactions, covering sixty-six tasks in nine domains and spanning two genders and four races. Independent human annotations corroborate the LMRA-generated bias evaluations. This study represents the first large-scale fairness evaluation based on real-world chat data. We highlight that post-training reinforcement learning techniques significantly mitigate these biases. This evaluation provides a practical methodology for ongoing bias monitoring and mitigation.

arxiv情報

著者	Tyna Eloundou,Alex Beutel,David G. Robinson,Keren Gu-Lemberg,Anna-Luisa Brakman,Pamela Mishkin,Meghan Shah,Johannes Heidecke,Lilian Weng,Adam Tauman Kalai
発行日	2025-03-03 15:13:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

First-Person Fairness in Chatbots

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー