Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

要約

大規模言語モデル (LLM) は、現実世界の意思決定システムに展開するためのタスク固有性を実現するためにますます採用されています。
これまでのいくつかの研究では、微調整適応戦略がモデルの公平性に及ぼす影響を研究することで、バイアス伝達仮説 (BTH) を調査し、事前にトレーニングされたマスク言語モデルの公平性が、微調整を使用して適応された場合にモデルの公平性に限定的な影響を与えることを発見しました。
チューニング。
プロンプトは現実世界のシステムにモデルを展開するアクセスしやすく計算効率の高い方法であるため、この研究では、プロンプト適応の下で BTH の研究を因果モデルに拡張します。
以前の研究とは対照的に、事前に訓練されたミストラル、ファルコン、およびラマのモデルの固有のバイアスは、代名詞 co-
参照解決タスク。
さらに、LLM が公正または偏った行動を示すように特別に促され (rho >= 0.92)、少数ショットの長さと常同的な構成が変化した場合 (rho >= 0.97) であっても、バイアス伝達は強い相関関係を維持することがわかりました。
私たちの調査結果は、事前にトレーニングされた LLM の公平性を確保することの重要性を強調しています。特に、後ですぐに適応して下流のタスクを実行するために LLM が使用される場合です。

要約(オリジナル)

Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior (rho >= 0.92), and few-shot length and stereotypical composition are varied (rho >= 0.97). Our findings highlight the importance of ensuring fairness in pre-trained LLMs, especially when they are later used to perform downstream tasks via prompt adaptation.

arxiv情報

著者	Natalie Mackraz,Nivedha Sivakumar,Samira Khorshidi,Krishna Patel,Barry-John Theobald,Luca Zappella,Nicholas Apostoloff
発行日	2024-12-04 18:32:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー