Learning to Refine with Fine-Grained Natural Language Feedback

要約

最近の研究では、大規模言語モデル (LLM) が生成した応答内のエラーを特定して修正する機能が調査されています。
これらの改良アプローチでは、どのような問題に対してどのサイズのモデルが改良を実行できるかを頻繁に評価しますが、改良のための効果的なフィードバックがどのようなものであるかについてはあまり注意が払われません。
この研究では、次の 3 つの異なる LLM コンピテンシーの構成としてフィードバックによる改良を検討することを提案します。(1) 不良世代の特定。
(2) きめ細かい自然言語フィードバックの生成。
(3) きめ細かいフィードバックによる改良。
最初のステップは高性能の識別モデルを使用して実装でき、ステップ 2 と 3 はプロンプトまたは微調整された LLM を介して実装できます。
このアプローチの重要な特性は、ステップ 2 の批判モデルがエラーに関するきめ細かいフィードバックを提供できることです。これは、ステップ 1 で判別を別のモデルにオフロードすることで可能になりました。さまざまな機能のモデルが、このアプローチを使用して改良することで恩恵を受けることを示します。
文書に基づいた要約の事実の一貫性を向上させるタスク。
全体として、私たちが提案した方法は、既存のエンドツーエンドの改良アプローチや、事実性の批判のために微調整されていない現在のトレーニング済みモデルよりも一貫して優れています。

要約(オリジナル)

Recent work has explored the capability of large language models (LLMs) to identify and correct errors in LLM-generated responses. These refinement approaches frequently evaluate what sizes of models are able to do refinement for what problems, but less attention is paid to what effective feedback for refinement looks like. In this work, we propose looking at refinement with feedback as a composition of three distinct LLM competencies: (1) identification of bad generations; (2) fine-grained natural language feedback generation; (3) refining with fine-grained feedback. The first step can be implemented with a high-performing discriminative model and steps 2 and 3 can be implemented either via prompted or fine-tuned LLMs. A key property of this approach is that the step 2 critique model can give fine-grained feedback about errors, made possible by offloading the discrimination to a separate model in step 1. We show that models of different capabilities benefit from refining with this approach on the task of improving factual consistency of document grounded summaries. Overall, our proposed method consistently outperforms existing end-to-end refinement approaches and current trained models not fine-tuned for factuality critiquing.

arxiv情報

著者	Manya Wadhwa,Xinyu Zhao,Junyi Jessy Li,Greg Durrett
発行日	2024-07-02 16:15:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning to Refine with Fine-Grained Natural Language Feedback

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー