Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding

要約

多くの場合、大規模な言語モデル（LLM）は特定のドメインで優れていますが、トレーニングの限界のために他のドメインでは不足しています。
したがって、LLMが補完的な知識を統合することにより、ドメイン全体のパフォーマンスを改善することにより、問題を共同で解決できるようにします。
この可能性を実現するために、追加のモデルトレーニングを必要とせずにテスト時に効率的なLLM知識融合を可能にする新しい共同投機的デコード（COSD）アルゴリズムを導入します。
COSDは、ドラフトモデルを使用して、初期シーケンスと学習しやすいルールまたは決定ツリーを生成して、これらのドラフトを改善するためにアシスタントモデルを呼び出すタイミングを決定します。
COSDは、知識の融合を強化するだけでなく、推論効率を改善し、ドメインとモデル間で転送可能であり、より大きな説明可能性を提供します。
実験結果は、COSDが既存の方法と比較してベンチマーク全体で最大10 \％の精度を向上させ、LLMベースのアプリケーションにスケーラブルで効果的なソリューションを提供することを示しています。

要約(オリジナル)

Large Language Models (LLMs) often excel in specific domains but fall short in others due to the limitations of their training. Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to improve their performance across domains. To realize this potential, we introduce a novel Collaborative Speculative Decoding (CoSD) algorithm that enables efficient LLM knowledge fusion at test time without requiring additional model training. CoSD employs a draft model to generate initial sequences and an easy-to-learn rule or decision tree to decide when to invoke an assistant model to improve these drafts. CoSD not only enhances knowledge fusion but also improves inference efficiency, is transferable across domains and models, and offers greater explainability. Experimental results demonstrate that CoSD improves accuracy by up to 10\% across benchmarks compared to existing methods, providing a scalable and effective solution for LLM-based applications

arxiv情報

著者	Ziyao Wang,Muneeza Azmat,Ang Li,Raya Horesh,Mikhail Yurochkin
発行日	2025-03-19 16:26:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー