Practical token pruning for foundation models in few-shot conversational virtual assistant systems

要約

エンタープライズ仮想アシスタント (VA) システムでは、インテント分類は、ユーザーの要求に基づいてユーザー入力がどのように処理されるかを決定する重要なコンポーネントです。
VA システムは、トレーニングと推論の時間が短く、少ないトレーニングサンプルでも高い精度を実現する、コスト効率の高い SaaS サービスとして期待されています。
対照的な学習目標を使用してトランスフォーマーベースの文埋め込みモデルを事前トレーニングし、意図分類モデルをトレーニングするときにモデルの埋め込みを特徴として活用します。
私たちのアプローチは、数ショットのシナリオで最先端の結果を達成し、一般的なインテント分類ベンチマークで他の商用ソリューションよりも優れたパフォーマンスを発揮します。
ただし、トランスフォーマーベースのモデルを介してフィーチャを生成すると、トランスフォーマーのアテンションメカニズムの二次実行時間が原因で、特にユーザー入力が長い場合に推論時間が増加します。
モデルの蒸留に加えて、インテント分類のためのタスク固有のトレーニングを必要とせずに動的なトークンプルーニングを構成する実用的なマルチタスク適応アプローチを導入します。
このアプローチにより、モデルのパフォーマンスに影響を与えることなく、一般的な文変換モデルの推論速度が向上することを実証します。

要約(オリジナル)

In an enterprise Virtual Assistant (VA) system, intent classification is the crucial component that determines how a user input is handled based on what the user wants. The VA system is expected to be a cost-efficient SaaS service with low training and inference time while achieving high accuracy even with a small number of training samples. We pretrain a transformer-based sentence embedding model with a contrastive learning objective and leverage the embedding of the model as features when training intent classification models. Our approach achieves the state-of-the-art results for few-shot scenarios and performs better than other commercial solutions on popular intent classification benchmarks. However, generating features via a transformer-based model increases the inference time, especially for longer user inputs, due to the quadratic runtime of the transformer’s attention mechanism. On top of model distillation, we introduce a practical multi-task adaptation approach that configures dynamic token pruning without the need for task-specific training for intent classification. We demonstrate that this approach improves the inference speed of popular sentence transformer models without affecting model performance.

arxiv情報

著者	Haode Qi,Cheng Qian,Jian Ni,Pratyush Singh,Reza Fazeli,Gengyu Wang,Zhongzheng Shu,Eric Wayne,Juergen Bross
発行日	2024-08-21 17:42:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Practical token pruning for foundation models in few-shot conversational virtual assistant systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー