Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

要約

強化学習 (RL) は、近視眼的ではなく、豊かな会話を行い、全体的なユーザー満足度を最大化する対話管理 (DM) エージェントの開発に大きな期待を寄せています。
RL と言語モデル (LM) の最近の開発にもかかわらず、RL を使用して会話型チャットボットを強化することは依然として困難です。その理由の 1 つは、RL では効果的に学習するためにオンライン探索が必要である一方、新しい人間とボットの対話を収集するには費用がかかり、安全ではない可能性があるためです。
ほとんどの LM エージェントが単語レベルで応答を生成するため、この問題は、これらのアルゴリズムに直面する組み合わせアクション空間によってさらに悪化します。
当社は、対話計画に特化したさまざまな RL アルゴリズムを開発しています。これは、最新の専門家混合言語モデル (MoE-LM) を活用しています。このモデルは、多様なセマンティクスを捕捉し、さまざまな意図を反映した発話を生成し、マルチターン DM に適したモデルです。
。
MoE-LM 構造を活用することで、私たちの方法はアクションスペースのサイズを大幅に削減し、RL ベースの DM の有効性を向上させます。
私たちはオープンドメインの対話でメソッドを評価し、生成された発話の意図の多様性と全体的な DM パフォーマンスに関してその有効性を実証します。

要約(オリジナル)

Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in RL and language models (LMs), using RL to power conversational chatbots remains challenging, in part because RL requires online exploration to learn effectively, whereas collecting novel human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action spaces facing these algorithms, as most LM agents generate responses at the word level. We develop a variety of RL algorithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert Language Models (MoE-LMs) — models that capture diverse semantics, generate utterances reflecting different intents, and are amenable for multi-turn DM. By exploiting MoE-LM structure, our methods significantly reduce the size of the action space and improve the efficacy of RL-based DM. We evaluate our methods in open-domain dialogue to demonstrate their effectiveness w.r.t.\ the diversity of intent in generated utterances and overall DM performance.

arxiv情報

著者	Dhawal Gupta,Yinlam Chow,Aza Tulepbergenov,Mohammad Ghavamzadeh,Craig Boutilier
発行日	2023-10-29 13:05:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー