Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

要約

タイトル：原則主導の自己アラインメントを利用した最小限の人間の監督で言語モデルをスクラッチから構築する方法

要約：

– ChatGPTなどのAIアシスタントエージェントは、人間の意図に合った応答を生成するために、人間の注釈を用いた教師あり微調整(SFT)と、人間からのフィードバックに基づく強化学習(RLHF)に大きく頼っています。
– これにより、人間の監督を得るための高いコストや、品質、信頼性、多様性、自己整合性、望ましくないバイアスの問題が生じることがあります。
– これらの課題に対処するために、SELF-ALIGNという新しいアプローチを提案しています。これは、LLMの生成力と原則主導の推論を組み合わせて、最小限の人間の監督でAIアシスタントエージェントを自己整列させるためのものです。
– SELF-ALIGNには、4つのステージがあります。第1段階では、LLMを使用して合成プロンプトを生成し、トピックガイド法を使用してプロンプトの多様性を拡張します。第2段階では、AIモデルが従うための少数の人間による原則を使用して、LLMをガイドし、デモンストレーションを通じたインコンテキスト学習によって、ユーザーのクエリに対して有益で倫理的かつ信頼性のある応答を生成します。第3段階では、高品質の自己整列済みの応答を用いて、元のLLMをファインチューニングし、原則セットとデモンストレーションなしでも、クエリごとに望ましい応答を直接生成できるようにします。最後に、過度に簡潔または間接的な応答の問題を解決するための改良ステップを提供します。
– SELF-ALIGNをLLaMA-65bベース言語モデルに適用して、DromedaryというAIアシスタントを開発しました。わずか300行以下の人間注釈(200シードプロンプト、16の一般的な原則、および5つのインコンテキスト学習の例を含む)で、Dromedaryは、さまざまな設定でのベンチマークデータセットにおいて、Text-Davinci-003やAlpacaなどのいくつかの最先端のAIシステムを大幅に上回るパフォーマンスを発揮しています。

要約(オリジナル)

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the prompt diversity; second, we use a small set of human-written principles for AI models to follow, and guide the LLM through in-context learning from demonstrations (of principles application) to produce helpful, ethical, and reliable responses to user’s queries; third, we fine-tune the original LLM with the high-quality self-aligned responses so that the resulting model can generate desirable responses for each query directly without the principle set and the demonstrations anymore; and finally, we offer a refinement step to address the issues of overly-brief or indirect responses. Applying SELF-ALIGN to the LLaMA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning). Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings.

arxiv情報

著者	Zhiqing Sun,Yikang Shen,Qinhong Zhou,Hongxin Zhang,Zhenfang Chen,David Cox,Yiming Yang,Chuang Gan
発行日	2023-05-04 17:59:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー