AutoJudge: Judge Decoding Without Manual Annotation

要約

Autojudgeを導入します。これは、タスク固有の損失の投機的デコードを使用して、大規模な言語モデル（LLM）推論を加速するフレームワークです。
トークンごとに元のモデル出力分布トークンを一致させる代わりに、生成されたトークンのどれが生成された応答の下流の品質に影響するかを特定し、「重要でない」トークンをより速く生成できるように保証を緩和します。
私たちのアプローチは、ターゲットモデルとドラフトモデルの間の不一致を修正して品質を維持する必要があるか、どちらがスキップされるかをテストするために、半砂糖の検索アルゴリズムに依存しています。
次に、既存のLLM埋め込みに基づいて軽量分類器をトレーニングして、推論時間に最終的な回答品質を損なうことなく安全に受け入れることができます。
ゼロショットGSM8K推論のLlama 3.2 1b（ドラフト）およびLlama 3.1 8b（ターゲット）モデルでアプローチをテストします。ここでは、検証サイクルごとに最大1.5倍の受け入れられたトークンを達成し、標準的な投機的デコードと比較して回答精度で1％未満の分解、少量の精度で2倍以上になります。
LiveCodebenchベンチマークに適用すると、当社のアプローチは、他のプログラミング固有の重要なトークンを自動的に検出し、同様のスピードアップを示し、タスク全体に一般化する能力を示します。

要約(オリジナル)

We introduce AutoJudge, a framework that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify which of the generated tokens affect the downstream quality of the generated response, relaxing the guarantee so that the ‘unimportant’ tokens can be generated faster. Our approach relies on a semi-greedy search algorithm to test which of the mismatches between target and draft model should be corrected to preserve quality, and which ones may be skipped. We then train a lightweight classifier based on existing LLM embeddings to predict, at inference time, which mismatching tokens can be safely accepted without compromising the final answer quality. We test our approach with Llama 3.2 1B (draft) and Llama 3.1 8B (target) models on zero-shot GSM8K reasoning, where it achieves up to 1.5x more accepted tokens per verification cycle with under 1% degradation in answer accuracy compared to standard speculative decoding and over 2x with small loss in accuracy. When applied to the LiveCodeBench benchmark, our approach automatically detects other, programming-specific important tokens and shows similar speedups, demonstrating its ability to generalize across tasks.

arxiv情報

著者	Roman Garipov,Fedor Velikonivtsev,Ruslan Svirschevski,Vage Egiazarian,Max Ryabinin
発行日	2025-04-28 17:59:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AutoJudge: Judge Decoding Without Manual Annotation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー