Convert Language Model into a Value-based Strategic Planner

要約

感情的なサポート会話（ESC）は、効果的な会話を通じて個人の感情的な苦痛を軽減することを目指しています。
大規模な言語モデル（LLM）はESCで顕著な進歩を遂げていますが、これらの研究のほとんどは状態モデルの観点から図を定義しない可能性があるため、長期的な満足度のための最適ではないソリューションを提供します。
このような問題に対処するために、LLMSのQラーニングを活用し、STRAQ*と呼ばれるフレームワークを提案します。
当社のフレームワークにより、プラグアンドプレイLLMがESC中に計画をブートストラップし、長期リターンに基づいて最適な戦略を決定し、最後にLLMを応答するように導くことができます。
ESCデータセットでの実質的な実験は、STRAQが直接的な推論、自己記述、連鎖、微調整、および有限状態マシンを含む多くのベースラインよりも優れていることを示唆しています。

要約(オリジナル)

Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Q-learning on LLMs, and propose a framework called straQ*. Our framework allows a plug-and-play LLM to bootstrap the planning during ESC, determine the optimal strategy based on long-term returns, and finally guide the LLM to response. Substantial experiments on ESC datasets suggest that straQ* outperforms many baselines, including direct inference, self-refine, chain of thought, finetuning, and finite state machines.

arxiv情報

著者	Xiaoyu Wang,Yue Zhao,Qingqing Gu,Zhonglin Jiang,Xiaokai Chen,Yong Chen,Luo Ji
発行日	2025-06-17 15:43:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Convert Language Model into a Value-based Strategic Planner

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー