Prompt-Based Length Controlled Generation with Reinforcement Learning

要約

ChatGPT や GPT-4 などの大規模言語モデル (LLM) は、幅広い NLP タスクで驚くべきパフォーマンスを発揮するため、大きな注目を集めています。
長さ制御された LLM の生成が重要なトピックとして浮上しており、これによりユーザーは、適切な長さの適切な解答やエッセイを生成するなど、より現実的なシナリオで LLM の機能を最大限に活用できるようになります。
さらに、LLM での自己回帰生成には非常に時間がかかりますが、この生成される長さを制御できる機能により、長さを制限することで推論コストを削減できます。
そこで、高精度な長さ制御生成を実現するプロンプトベースの長さ制御手法を提案する。
特に、トレーニング可能な報酬モデルまたはルールベースの報酬モデルによって与えられる報酬信号による強化学習を採用します。これにより、事前定義された制御命令に従う出力に報酬を与えることで、LLM の長さ制御能力がさらに強化されます。
ルールベースの推論を可能にするために、ユーザーの入力から標準制御情報を収集する標準プロンプト抽出機能も導入しました。
実験の結果、私たちの方法により、CNNDM や NYT などの一般的なデータセットの要約タスクにおけるプロンプトベースの長さ制御の精度が大幅に向上することがわかりました。
標準プロンプト抽出プログラムと RL 調整モデルはどちらも、目に見えないコントロールプロンプトテンプレートに対する強力な一般化機能を示しています。

要約(オリジナル)

Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can reduce the inference cost by limiting the length. Therefore, we propose a prompt-based length control method to achieve high-accuracy length controlled generation. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward models, which further enhances the length-control ability of LLMs by rewarding outputs that follows pre-defined control instruction. To enable rule-based inference, we also introduce standard prompt extractor to collect the standard control information from users’ input. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. Both the standard prompt extractor and the RL-tuned model have show strong generalization ability to unseen control prompt templates.

arxiv情報

著者	Renlong Jie,Xiaojun Meng,Lifeng Shang,Xin Jiang,Qun Liu
発行日	2023-09-30 07:54:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Prompt-Based Length Controlled Generation with Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー