Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings

要約

大規模な言語モデル（LLM）は、ユースケース固有の微調整を必要とせずに、複雑なダイアログタスクを処理する際に顕著な能力を実証しています。
ただし、リアルタイムでライブダイアログを分析するには、低遅延処理システムが必要であり、潜時の制約により数十億のパラメーターを展開することは実用的ではありません。
その結果、実務家は、高品質で人間が発射されたデータセットで訓練された数百万のパラメーターを持つ小さなモデルを好むことがよくあります。
しかし、このようなデータセットのキュレーションは時間がかかり、費用がかかります。
その結果、LLM生成ラベルのスケーラビリティを人間の注釈の精度と組み合わせる必要があり、微調整された小さなモデルがより高いモデルに匹敵する高速と精度の両方を実現できるようにします。
この論文では、この課題に対処するためのシンプルで効果的なフレームワークを紹介します。
私たちのアプローチは、意図の検出、対話状態追跡などのタスクを網羅する、発作ごとの分類問題のために特別に設計されています。
学生モデルの不正確さの主な原因であるLLMSからのラベル付けエラーの影響を軽減するために、騒音削減の好み学習損失を提案します。
実験結果は、この方法が、センチメント検出（2ドルを超える）、対話法分類（$ 1.5 \％$を超える）など、発話レベルの対話タスク全体で精度を大幅に改善することを示しています。

要約(オリジナル)

Large language models (LLMs) have demonstrated remarkable capabilities in handling complex dialogue tasks without requiring use case-specific fine-tuning. However, analyzing live dialogues in real-time necessitates low-latency processing systems, making it impractical to deploy models with billions of parameters due to latency constraints. As a result, practitioners often prefer smaller models with millions of parameters, trained on high-quality, human-annotated datasets. Yet, curating such datasets is both time-consuming and costly. Consequently, there is a growing need to combine the scalability of LLM-generated labels with the precision of human annotations, enabling fine-tuned smaller models to achieve both higher speed and accuracy comparable to larger models. In this paper, we introduce a simple yet effective framework to address this challenge. Our approach is specifically designed for per-utterance classification problems, which encompass tasks such as intent detection, dialogue state tracking, and more. To mitigate the impact of labeling errors from LLMs — the primary source of inaccuracies in student models — we propose a noise-reduced preference learning loss. Experimental results demonstrate that our method significantly improves accuracy across utterance-level dialogue tasks, including sentiment detection (over $2\%$), dialogue act classification (over $1.5\%$), etc.

arxiv情報

著者	Xuanqing Liu,Luyang Kong,Wei Niu,Afshin Khashei,Belinda Zeng,Steve Johnson,Jon Jay,Davor Golac,Matt Pope
発行日	2025-03-07 17:46:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー