Multi-dimensional Evaluation of Empathetic Dialog Responses

要約

共感は、効果的で満足のいく会話コミュニケーションにとって不可欠です。
会話の共感を測定するこれまでの取り組みは、主に表現されたコミュニケーションの意図、つまり共感が表現される方法に焦点を当てていました。
しかし、これらの作品は、会話が話し手と聞き手の両方を含む共同作業でもあるという事実を無視しています。
対照的に、我々は、話し手の観点から表現された意図と聞き手の観点から知覚された共感の両方を測定する多次元共感評価フレームワークを提案します。
私たちは、提案したフレームワークを適用して社内の顧客サービス対話を分析します。
2 つの側面 (表現された意図のタイプと知覚された共感) は相互に関連しており、知覚された共感は対話の満足度レベルと高い相関があることがわかりました。
アノテーションのコストを削減するために、LLM のプロンプトと言語モデルベースの分類器のトレーニングなど、会話の共感を自動的に測定するためのさまざまなオプションを検討します。
私たちの実験では、GPT-4 や Flan ファミリーモデルなどの一般的なモデルを使用したプロンプト手法は、公開データセットと内部データセットの両方で比較的パフォーマンスが悪いことがわかりました。
対照的に、Flan-T5 ファミリーモデルに基づく命令で微調整された分類器は、以前の研究や競合ベースラインよりも優れたパフォーマンスを発揮します。
私たちは、指導微調整法の強力なパフォーマンスについてさらに洞察を得るために、詳細なアブレーション研究を実施します。

要約(オリジナル)

Empathy is critical for effective and satisfactory conversational communication. Prior efforts to measure conversational empathy mostly focus on expressed communicative intents — that is, the way empathy is expressed. Yet, these works ignore the fact that conversation is also a collaboration involving both speakers and listeners. In contrast, we propose a multi-dimensional empathy evaluation framework to measure both expressed intents from the speaker’s perspective and perceived empathy from the listener’s perspective. We apply our proposed framework to analyze our internal customer-service dialogue. We find the two dimensions (expressed intent types and perceived empathy) are inter-connected, and perceived empathy has a high correlation with dialogue satisfaction levels. To reduce the annotation cost, we explore different options to automatically measure conversational empathy: prompting LLMs and training language model-based classifiers. Our experiments show that prompting methods with even popular models like GPT-4 and Flan family models perform relatively poorly on both public and our internal datasets. In contrast, instruction-finetuned classifiers based on Flan-T5 family models outperform prior works and competitive baselines. We conduct a detailed ablation study to give more insights into instruction finetuning method’s strong performance.

arxiv情報

著者	Zhichao Xu,Jiepu Jiang
発行日	2024-04-16 16:34:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-dimensional Evaluation of Empathetic Dialog Responses

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー