Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment

要約

人間の好みに合わせることは、大規模言語モデル (LLM) の望ましい特性です。
現在、主な調整アプローチは、ヒューマンフィードバックからの強化学習 (RLHF) に基づいています。
RLHF の有効性にもかかわらず、実装とトレーニングは複雑であるため、最近の研究では、教師あり微調整 (SFT) に基づいた代替アライメントアプローチを開発する方法が検討されています。
SFT の主な制限は、基本的に模倣学習を行うため、期待される動作が何であるかを完全には理解できないことです。
この問題に対処するために、FIGA という名前の改良された位置合わせアプローチを提案します。
従来の方法とは異なり、良い応答と悪い応答を対比することによって得られる、きめの細かい (つまり、トークンまたはフレーズレベルの) 品質信号が組み込まれています。
私たちのアプローチは 2 つの大きな貢献をしました。
まず、最初の応答と対応する修正された応答をペアにする、洗練されたアラインメントデータセットを厳選します。
次に、粒度の細かい品質信号を活用して、位置合わせのための LLM の学習を指示できる新しい損失関数を考案します。
広範な実験により、多数の競合ベースラインを比較することにより、当社のアプローチの有効性が実証されました。

要約(オリジナル)

Alignment with human preference is a desired property of large language models (LLMs). Currently, the main alignment approach is based on reinforcement learning from human feedback (RLHF). Despite the effectiveness of RLHF, it is intricate to implement and train, thus recent studies explore how to develop alternative alignment approaches based on supervised fine-tuning (SFT). A major limitation of SFT is that it essentially does imitation learning, which cannot fully understand what are the expected behaviors. To address this issue, we propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained (i.e., token or phrase level) quality signals that are derived by contrasting good and bad responses. Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones. Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment. Extensive experiments have demonstrated the effectiveness of our approaches by comparing a number of competitive baselines.

arxiv情報

著者	Geyang Guo,Ranchi Zhao,Tianyi Tang,Wayne Xin Zhao,Ji-Rong Wen
発行日	2023-11-07 15:36:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー