‘Oops, Did I Just Say That?’ Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

要約

タイトル：「Oops、Did I Just Say That？」大規模言語モデルの非倫理的な提案を提案-批判-反省プロセスによってテスト及び修復する方法

要約：

– 大規模言語モデル（LLM）が様々なアプリケーションで人気が高まるにつれ、人間の価値観との一致を確保することが最も重要な問題になっている。
– LLMは日常生活において一般的なAIアシスタントとして大きな潜在力を持っているが、微妙に非倫理的な提案によって深刻な懸念が生じている。
– 非倫理的な提案を自動的にテスト及び修復することは困難であるため、本研究では、LLMによって行われた非倫理的な提案をテスト及び修復するフレームワークを提案している。
– まず、ETHICSSUITEというテストスイートを提案し、LLMをテストするための複雑でコンテキストに沿った実践的なモラルシナリオを提示する。
– 次に、提案-批判-反省（SCR）プロセスを提案し、非倫理的な提案を検出する自動テストオラクルとして機能する。
– LLMが非倫理的な提案を生成するかどうかを判断するのは困難で、人間の専門知識が必要であり、手間がかかるため、PCRタスクに再構築して自動的にチェックすることができるようになった。
– さらに、リアルタイムでLLMによって行われた非倫理的な提案を修復するための新しいオンザフライ（OTF）修復スキームを提案した。このスキームは、中程度のコストでブラックボックスAPI設定のLLMに適用できる。
– ETHICSSUITEを用いた7つの人気のあるLLM（ChatGPT、GPT-4など）の研究により、109,824件の非倫理的な提案が明らかになった。
– Llama-13BとChatGPTの2つのLLMにOTFスキームを適用することで、かなりの数の非倫理的な提案を修復することができたため、より倫理的に意識の高いLLMを目指す一歩となった。

要約(オリジナル)

As the popularity of large language models (LLMs) soars across various applications, ensuring their alignment with human values has become a paramount concern. In particular, given that LLMs have great potential to serve as general-purpose AI assistants in daily life, their subtly unethical suggestions become a serious and real concern. Tackling the challenge of automatically testing and repairing unethical suggestions is thus demanding. This paper introduces the first framework for testing and repairing unethical suggestions made by LLMs. We first propose ETHICSSUITE, a test suite that presents complex, contextualized, and realistic moral scenarios to test LLMs. We then propose a novel suggest-critic-reflect (SCR) process, serving as an automated test oracle to detect unethical suggestions. We recast deciding if LLMs yield unethical suggestions (a hard problem; often requiring human expertise and costly to decide) into a PCR task that can be automatically checked for violation. Moreover, we propose a novel on-the-fly (OTF) repairing scheme that repairs unethical suggestions made by LLMs in real-time. The OTF scheme is applicable to LLMs in a black-box API setting with moderate cost. With ETHICSSUITE, our study on seven popular LLMs (e.g., ChatGPT, GPT-4) uncovers in total 109,824 unethical suggestions. We apply our OTF scheme on two LLMs (Llama-13B and ChatGPT), which generates valid repair to a considerable amount of unethical ones, paving the way for more ethically conscious LLMs.

arxiv情報

著者	Pingchuan Ma,Zongjie Li,Ao Sun,Shuai Wang
発行日	2023-05-04 08:00:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

‘Oops, Did I Just Say That?’ Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー