Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

要約

大規模言語モデル (LLM) は、人工知能 (AI) の現在の成功の基盤ですが、バイアスがかかることは避けられません。
リスクを効果的に伝え、緩和の取り組みを促すために、これらのモデルには、AI のすべての対象者に適した、その識別特性の適切かつ直感的な説明が必要です。
社会心理学研究の辞書に基づいて、ステレオタイプの側面に関するバイアスプロファイルを提案します。
これらの側面に沿って、コンテキストとレイヤー全体にわたるコンテキスト埋め込みにおけるジェンダーバイアスを調査し、12 の異なる LLM のステレオタイププロファイルを生成し、バイアスを明らかにして視覚化するための直観とユースケースを実証します。

要約(オリジナル)

Large language models (LLMs) are the foundation of the current successes of artificial intelligence (AI), however, they are unavoidably biased. To effectively communicate the risks and encourage mitigation efforts these models need adequate and intuitive descriptions of their discriminatory properties, appropriate for all audiences of AI. We suggest bias profiles with respect to stereotype dimensions based on dictionaries from social psychology research. Along these dimensions we investigate gender bias in contextual embeddings, across contexts and layers, and generate stereotype profiles for twelve different LLMs, demonstrating their intuition and use case for exposing and visualizing bias.

arxiv情報

著者	Carolin M. Schuster,Maria-Alexandra Dinisor,Shashwat Ghatiwala,Georg Groh
発行日	2024-11-25 16:14:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー