Measuring Spiritual Values and Bias of Large Language Models

要約

大規模言語モデル (LLM) は、さまざまな背景を持つユーザーにとって不可欠なツールとなっています。
膨大なコーパスでトレーニングされた LLM は、トレーニング前のデータに埋め込まれた言語的および文化的ニュアンスを反映します。
ただし、このデータに固有の値と視点は LLM の動作に影響を与え、潜在的なバイアスにつながる可能性があります。
その結果、精神的または道徳的価値観に関連する文脈で LLM を使用するには、これらの根底にある偏見を注意深く考慮する必要があります。
私たちの仕事は、人気のある LLM の精神的価値をテストすることによって仮説を検証することから始まります。
実験結果は、無神論者や世俗主義者の固定観念とは対照的に、LLM の精神的価値観は非常に多様であることを示しています。
次に、社会的公平性のシナリオにおいて、さまざまな精神的価値観が LLM にどのような影響を与えるかを調査します (ヘイトスピーチの識別など)。
私たちの調査結果は、精神的価値観の違いが、憎しみの対象となるグループの違いに対する感受性の違いに実際につながっていることを明らかにしています。
さらに、私たちはスピリチュアルなテキストに関するLLMの事前トレーニングを継続することを提案しており、経験的な結果は、スピリチュアルな偏見を軽減する上でこのアプローチの有効性を示しています。

要約(オリジナル)

Large language models (LLMs) have become integral tool for users from various backgrounds. LLMs, trained on vast corpora, reflect the linguistic and cultural nuances embedded in their pre-training data. However, the values and perspectives inherent in this data can influence the behavior of LLMs, leading to potential biases. As a result, the use of LLMs in contexts involving spiritual or moral values necessitates careful consideration of these underlying biases. Our work starts with verification of our hypothesis by testing the spiritual values of popular LLMs. Experimental results show that LLMs’ spiritual values are quite diverse, as opposed to the stereotype of atheists or secularists. We then investigate how different spiritual values affect LLMs in social-fairness scenarios e.g., hate speech identification). Our findings reveal that different spiritual values indeed lead to different sensitivity to different hate target groups. Furthermore, we propose to continue pre-training LLMs on spiritual texts, and empirical results demonstrate the effectiveness of this approach in mitigating spiritual bias.

arxiv情報

著者	Songyuan Liu,Ziyang Zhang,Runze Yan,Wei Wu,Carl Yang,Jiaying Lu
発行日	2024-10-15 14:33:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Measuring Spiritual Values and Bias of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー