OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

要約

命令調整された大規模言語モデル (LLM) は、自然言語命令に適合する応答を生成する驚くべき能力を最近実証しました。
ただし、未解決の研究課題は、トレーニングされたモデルとその応答に固有のバイアスに関するものです。
たとえば、LLM の調整に使用されるデータが特定の政治的偏見を持つ人物によって主に書かれている場合、生成された回答がこの偏見を共有すると期待できます。
現在の研究活動は、そのようなモデルのバイアスを取り除くこと、または潜在的にバイアスがかかっている回答を抑制することを目指しています。
このデモンストレーションでは、私たちは命令チューニングにおけるバイアスについて異なる見解を示します。バイアスを抑制することを目指すのではなく、バイアスを明示的かつ透明にすることを目指します。
この目的を達成するために、ユーザーが質問し、調査したいバイアスをすべて選択できる Web デモである OpinionGPT を紹介します。
デモでは、選択したバイアスのそれぞれを表すテキストに基づいて微調整されたモデルを使用してこの質問に答え、並べて比較できるようにします。
基礎となるモデルをトレーニングするために、11 の異なるバイアス (政治的、地理的、性別、年齢) を特定し、各回答がこれらの人口統計のいずれかのメンバーによって書かれた命令調整コーパスを導き出しました。
このペーパーでは、OpinionGPT を紹介し、バイアス認識モデルをトレーニングした方法を示し、Web アプリケーション (https://opiniongpt.informatik.hu-berlin.de で入手可能) を紹介します。

要約(オリジナル)

Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable ability to generate fitting responses to natural language instructions. However, an open research question concerns the inherent biases of trained models and their responses. For instance, if the data used to tune an LLM is dominantly written by persons with a specific political bias, we might expect generated answers to share this bias. Current research work seeks to de-bias such models, or suppress potentially biased answers. With this demonstration, we take a different view on biases in instruction-tuning: Rather than aiming to suppress them, we aim to make them explicit and transparent. To this end, we present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases, allowing side-by-side comparison. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics. This paper presents OpinionGPT, illustrates how we trained the bias-aware model and showcases the web application (available at https://opiniongpt.informatik.hu-berlin.de).

arxiv情報

著者	Patrick Haller,Ansar Aynetdinov,Alan Akbik
発行日	2023-09-07 17:41:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー