Prompt-Based Length Controlled Generation with Reinforcement Learning

要約

最近、ChatGPT や GPT-4 などの大規模言語モデル (LLM) が、その驚くべき改善とパフォーマンスにより大きな注目を集めています。
長さ制御された LLM の生成が重要なトピックとして浮上しており、これによりユーザーは、適切な長さの適切な解答やエッセイを生成するなど、より現実的なシナリオで LLM の機能を最大限に活用できるようになります。
さらに、LLM での自己回帰生成は非常に時間がかかりますが、この生成される長さを制御できる機能により、長さを制限することで推論コストを任意に削減できるため、さまざまなニーズを満たすことができます。
したがって、この長さ制御された生成を実現するためのプロンプトベースの長さ制御方法を提案することを目的としています。これは GPT スタイルの LLM にも広く適用できます。
特に、トレーニング可能な報酬モデルまたはルールベースの報酬モデルによって与えられる報酬信号を使用した強化学習を採用します。これは、事前に定義されたターゲット長に報酬を与えることにより、LLM の生成にさらに影響を与えます。
実験の結果、私たちの方法により、CNNDM や NYT などの一般的なデータセットの要約タスクにおけるプロンプトベースの長さ制御の精度が大幅に向上することがわかりました。
この長さを制御できる能力は、LLM の時代に向けてさらなる可能性をもたらすと信じています。

要約(オリジナル)

Recently, large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising improvement and performance. Length controlled generation of LLMs emerges as an important topic, which also enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can arbitrarily reduce the inference cost by limiting the length, and thus satisfy different needs. Therefore, we aim to propose a prompt-based length control method to achieve this length controlled generation, which can also be widely applied in GPT-style LLMs. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward model, which further affects the generation of LLMs via rewarding a pre-defined target length. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. We believe this length-controllable ability can provide more potentials towards the era of LLMs.

arxiv情報

著者	Renlong Jie,Xiaojun Meng,Lifeng Shang,Xin Jiang,Qun Liu
発行日	2023-08-23 09:43:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Prompt-Based Length Controlled Generation with Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー