Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter

要約

現代社会における感情的なストレスの高まりは、感情的なサポートの会話（ESC）の需要を高めました。
大規模な言語モデル（LLMS）はESCの有望を示していますが、2つの重要な課題に直面しています。（1）戦略選択の精度と（2）優先バイアスは、ユーザーの感情的なニーズに適応性を制限します。
既存の監視された微調整（SFT）は、微妙な戦略トレードオフをモデル化することなく、単一の金標準応答のモデルを厳密に訓練するため、これらの問題に対処するのに苦労しています。
これらの制限を克服するために、各ダイアログターンで戦略選択の好みを最適化する新しいアプローチである、戦略の最適化（CSO）を提案します。
最初にモンテカルロツリー検索を活用して、ターンレベルの戦略応答ペアを備えた高品質の優先データセットであるESC-Proを構築します。
CSOを使用したESC-Proのトレーニングにより、戦略の精度とバイアス緩和の両方が向上し、LLMがより共感的で文脈的に適切な応答を生成できるようになります。
llama-3.1-8b、gemma-2-9b、およびqwen2.5-7bの実験は、CSOが標準SFTを上回ることを示しており、ESCにおける細粒のターンレベルの好みモデリングの有効性を強調しています。

要約(オリジナル)

The growing emotional stress in modern society has increased the demand for Emotional Support Conversations (ESC). While Large Language Models (LLMs) show promise for ESC, they face two key challenges: (1) low strategy selection accuracy, and (2) preference bias, limiting their adaptability to emotional needs of users. Existing supervised fine-tuning (SFT) struggles to address these issues, as it rigidly trains models on single gold-standard responses without modeling nuanced strategy trade-offs. To overcome these limitations, we propose Chain-of-Strategy Optimization (CSO), a novel approach that optimizes strategy selection preferences at each dialogue turn. We first leverage Monte Carlo Tree Search to construct ESC-Pro, a high-quality preference dataset with turn-level strategy-response pairs. Training on ESC-Pro with CSO improves both strategy accuracy and bias mitigation, enabling LLMs to generate more empathetic and contextually appropriate responses. Experiments on LLaMA-3.1-8B, Gemma-2-9B, and Qwen2.5-7B demonstrate that CSO outperforms standard SFT, highlighting the efficacy of fine-grained, turn-level preference modeling in ESC.

arxiv情報

著者	Weixiang Zhao,Xingyu Sui,Xinyang Han,Yang Deng,Yulin Hu,Jiahe Guo,Libo Qin,Qianyun Du,Shijin Wang,Yanyan Zhao,Bing Qin,Ting Liu
発行日	2025-03-07 12:07:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー