Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue

要約

強化学習 (RL) は、タスク指向対話 (TOD) システムを強化するための強力なアプローチです。
ただし、既存の RL 手法は、対話ポリシー学習 (DPL) や応答生成 (RG) などの生成タスクに主に焦点を当て、理解のための対話状態追跡 (DST) を無視する傾向があります。
この狭い焦点により、理解と生成の間の相互依存性が見落とされ、システムが全体的に最適なパフォーマンスを達成することが制限されます。
さらに、RL 手法は、報酬がまばらで遅延するという課題に直面しており、トレーニングと最適化が複雑になります。
これらの問題に対処するために、トークン生成全体を通じて段階的な報酬を導入することで、RL を理解タスクと生成タスクの両方に拡張します。
DST でより多くのスロットが正しく埋められると理解報酬が増加し、ユーザーのリクエストが正確に含まれると生成報酬も増加します。
私たちのアプローチは、タスクの完了に合わせてバランスの取れた最適化を提供します。
実験結果は、私たちのアプローチが TOD システムのパフォーマンスを効果的に強化し、MultiWOZ2.0、MultiWOZ2.1、In-Car を含む 3 つの広く使用されているデータセットで新しい最先端の結果を達成することを示しています。
私たちのアプローチは、現在のモデルと比較して、低リソース設定でも優れた数ショット能力を示しています。

要約(オリジナル)

Reinforcement learning (RL) is a powerful approach to enhance task-oriented dialogue (TOD) systems. However, existing RL methods tend to mainly focus on generation tasks, such as dialogue policy learning (DPL) or response generation (RG), while neglecting dialogue state tracking (DST) for understanding. This narrow focus limits the systems to achieve globally optimal performance by overlooking the interdependence between understanding and generation. Additionally, RL methods face challenges with sparse and delayed rewards, which complicates training and optimization. To address these issues, we extend RL into both understanding and generation tasks by introducing step-by-step rewards throughout the token generation. The understanding reward increases as more slots are correctly filled in DST, while the generation reward grows with the accurate inclusion of user requests. Our approach provides a balanced optimization aligned with task completion. Experimental results demonstrate that our approach effectively enhances the performance of TOD systems and achieves new state-of-the-art results on three widely used datasets, including MultiWOZ2.0, MultiWOZ2.1, and In-Car. Our approach also shows superior few-shot ability in low-resource settings compared to current models.

arxiv情報

著者	Huifang Du,Shuqin Li,Minghao Wu,Xuejing Feng,Yuan-Fang Li,Haofen Wang
発行日	2024-06-20 16:15:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー