Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation

要約

ほとんどのタスク指向ダイアログ (TOD) ベンチマークは、厳格なユーザー目標、つまり「ユーザーの慣れ」バイアスによってシステムの機能内でユーザーの行動を制限することにより、ユーザーがシステムの使用方法を正確に知っていることを前提としています。
このデータバイアスは、既存の静的評価ではその影響を理解することが不可能であるため、データ駆動型 TOD システムと組み合わせるとさらに深刻になります。
したがって、私たちは対話型のユーザー調査を実施して、TOD システムが現実的なシナリオに対してどれほど脆弱であるかを明らかにします。
特に、1) システムの境界に準拠した詳細な目標指示 (クローズドゴール) と、2) 多くの場合サポートされていないが現実的であるあいまいな目標指示 (オープンゴール) を使用してユーザーを比較します。
私たちの調査では、オープンゴール設定での会話はシステムの壊滅的な障害につながり、対話の 92% に重大な問題があったことが明らかになりました。
さらに、エラーアノテーションを通じて 2 つの設定間の特徴を特定するために徹底的な分析を実行します。
このことから、システムの能力を超えているにもかかわらず、システムがユーザー要求を処理するふりをする、新しい「ふり」動作を発見しました。
最近の大規模な言語モデルもこの動作の影響を受ける可能性があることを示しながら、その特徴と毒性について説明します。

要約(オリジナル)

Most task-oriented dialogue (TOD) benchmarks assume users that know exactly how to use the system by constraining the user behaviors within the system’s capabilities via strict user goals, namely ‘user familiarity’ bias. This data bias deepens when it combines with data-driven TOD systems, as it is impossible to fathom the effect of it with existing static evaluations. Hence, we conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios. In particular, we compare users with 1) detailed goal instructions that conform to the system boundaries (closed-goal) and 2) vague goal instructions that are often unsupported but realistic (open-goal). Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system, in which 92% of the dialogues had significant issues. Moreover, we conduct a thorough analysis to identify distinctive features between the two settings through error annotation. From this, we discover a novel ‘pretending’ behavior, in which the system pretends to handle the user requests even though they are beyond the system’s capabilities. We discuss its characteristics and toxicity while showing recent large language models can also suffer from this behavior.

arxiv情報

著者	Takyoung Kim,Jamin Shin,Young-Ho Kim,Sanghwan Bae,Sungdong Kim
発行日	2024-07-01 09:23:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー