Evaluating Large Language Model Biases in Persona-Steered Generation

要約

ペルソナに基づいてテキストを生成するタスクでは、ペルソナに適合する個人が持つ可能性のあるビューの分布を反映するテキストを生成する大規模言語モデル (LLM) が必要です。
人々は多面的なペルソナを持っていますが、LLM によって生成された意見のバイアスに関するこれまでの研究では、複数選択の設定または一次元のペルソナのみが調査されてきました。
私たちは、不一致なペルソナを、人体調査データにおいて 1 つの特徴が他の特徴の可能性を低くする複数の特徴を持つペルソナとして定義します。
軍事支出の増加を支持する政治的リベラル派。
LLM は、一致するペルソナに比べて、一致しないペルソナへの誘導性が 9.7% 低く、場合によっては、ターゲットのスタンスではなく、その人口統計に関連した典型的なスタンスを生成することがわかりました。
私たちが評価する、ヒューマンフィードバックからの強化学習 (RLHF) で微調整されたモデルは、特に政治的リベラル派や女性に関連するスタンスに対してより柔軟に操作できますが、ペルソナについての見解の多様性は著しく低くなります。
また、多肢選択式の意見評価からは予測できない LLM ステアビリティの分散も見つかりました。
私たちの結果は、新たな LLM の意見バイアスを表面化する可能性があるため、オープンエンドのテキスト生成におけるモデルを評価することの重要性を示しています。
さらに、このような設定は、モデルをより豊かで多様な視点に導く能力に光を当てることができます。

要約(オリジナル)

The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas. We also find variance in LLM steerability that cannot be predicted from multiple-choice opinion evaluation. Our results show the importance of evaluating models in open-ended text generation, as it can surface new LLM opinion biases. Moreover, such a setup can shed light on our ability to steer models toward a richer and more diverse range of viewpoints.

arxiv情報

著者	Andy Liu,Mona Diab,Daniel Fried
発行日	2024-05-30 17:06:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Large Language Model Biases in Persona-Steered Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー