On the Effectiveness of Offline RL for Dialogue Response Generation

要約

言語モデルの一般的なトレーニング手法は、教師強制 (TF) です。
TF は、同じ意味が異なる方法で表現される場合でも、人間の言語と正確に一致させようとします。
これにより、対話応答の生成にシーケンスレベルの目標を使用するようになります。
この論文では、そのような目的を最大化するためのさまざまなオフライン強化学習 (RL) 手法の有効性を研究します。
複数のデータセット、モデル、指標にわたる包括的な評価を提示します。
オフライン RL は、教師による強制に比べて明らかなパフォーマンスの向上を示しており、同時にトレーニングの不安定性を引き起こしたり、実践的なトレーニング予算を犠牲にしたりすることはありません。

要約(オリジナル)

A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.

arxiv情報

著者	Paloma Sodhi,Felix Wu,Ethan R. Elenberg,Kilian Q. Weinberger,Ryan McDonald
発行日	2023-07-23 20:43:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On the Effectiveness of Offline RL for Dialogue Response Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー