Towards detecting unanticipated bias in Large Language Models

要約

昨年来、ChatGPTのような大規模言語モデル（LLM）が広く利用されるようになり、以前の機械学習システムと同様の公平性の問題が見られるようになりました。現在の研究は、主に学習データにおけるこれらのバイアスの分析と定量化、およびこれらのモデルの意思決定への影響、ならびに緩和戦略の開発に焦点を当てています。この研究の主な対象は、性別、人種、民族、言語に関するよく知られたバイアスである。しかし、LLMが他の、あまり明らかではない暗黙のバイアスの影響も受けていることは明らかである。これらのモデルは複雑で不透明なことが多いため、このようなバイアスを検出することは困難であるが、様々なアプリケーションにおいて悪影響を及ぼす可能性があるため、これは極めて重要である。本論文では、LLMにおけるこのような予期せぬバイアスを検出するための新たな道を、特に不確実性定量化と説明可能なAI手法に焦点を当てて探る。これらのアプローチは、モデルの意思決定の確実性を評価し、LLMの内部意思決定プロセスをより透明化することで、すぐには明らかにならないバイアスを特定し、理解することを目的としている。この研究を通じて、より公平で透明性の高いAIシステムの開発に貢献することを目指しています。

要約(オリジナル)

Over the last year, Large Language Models (LLMs) like ChatGPT have become widely available and have exhibited fairness issues similar to those in previous machine learning systems. Current research is primarily focused on analyzing and quantifying these biases in training data and their impact on the decisions of these models, alongside developing mitigation strategies. This research largely targets well-known biases related to gender, race, ethnicity, and language. However, it is clear that LLMs are also affected by other, less obvious implicit biases. The complex and often opaque nature of these models makes detecting such biases challenging, yet this is crucial due to their potential negative impact in various applications. In this paper, we explore new avenues for detecting these unanticipated biases in LLMs, focusing specifically on Uncertainty Quantification and Explainable AI methods. These approaches aim to assess the certainty of model decisions and to make the internal decision-making processes of LLMs more transparent, thereby identifying and understanding biases that are not immediately apparent. Through this research, we aim to contribute to the development of fairer and more transparent AI systems.

arxiv情報

著者	Anna Kruspe
発行日	2024-04-03 11:25:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Towards detecting unanticipated bias in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー