Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling

要約

タスク指向のダイアログ（TOD）システムの重要なコンポーネントである意図検出は、複雑な相互関係を備えた統合可能なツールの急速な流入に適応する上で大きな課題に直面しています。
ゼロショットの再定式化やLLMベースの動的認識などの既存のアプローチは、目に見えない意図に遭遇したときのパフォーマンスの劣化と闘い、誤ったタスクルーティングにつながります。
目に見えないタスクでのモデルの一般化パフォーマンスを強化するために、意図検出タスクのグループ相対ポリシー最適化（GRPO）トレーニング中に、報酬ベースのカリキュラムサンプリング（RCS）と組み合わせた強化学習（RL）を採用しています。
実験は、RLトレーニングモデルが一般化において監視された微調整（SFT）ベースラインを大幅に上回ることを示しています。
また、RCSの導入により、トレーニング中の困難なケースにモデルを集中させることにより、意図検出におけるRLの有効性を大幅に強化します。
さらに、RLに考え方のチェーン（COT）プロセスを組み込むことは、特に複雑な意図検出タスクの一般化を改善し、挑戦的なシナリオにおける思考の重要性を強調しています。
この作業は、意図検出タスクの一般化を進め、適応可能なダイアログシステムを展開するための実用的な洞察を提供します。

要約(オリジナル)

Intent detection, a critical component in task-oriented dialogue (TOD) systems, faces significant challenges in adapting to the rapid influx of integrable tools with complex interrelationships. Existing approaches, such as zero-shot reformulations and LLM-based dynamic recognition, struggle with performance degradation when encountering unseen intents, leading to erroneous task routing. To enhance the model’s generalization performance on unseen tasks, we employ Reinforcement Learning (RL) combined with a Reward-based Curriculum Sampling (RCS) during Group Relative Policy Optimization (GRPO) training in intent detection tasks. Experiments demonstrate that RL-trained models substantially outperform supervised fine-tuning (SFT) baselines in generalization. Besides, the introduction of the RCS, significantly bolsters the effectiveness of RL in intent detection by focusing the model on challenging cases during training. Moreover, incorporating Chain-of-Thought (COT) processes in RL notably improves generalization in complex intent detection tasks, underscoring the importance of thought in challenging scenarios. This work advances the generalization of intent detection tasks, offering practical insights for deploying adaptable dialogue systems.

arxiv情報

著者	Zihao Feng,Xiaoxue Wang,Ziwei Bai,Donghang Su,Bowen Wu,Qun Yu,Baoxun Wang
発行日	2025-04-21 03:29:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー