Coupling without Communication and Drafter-Invariant Speculative Decoding

要約

アリスに分布$ p $があり、ボブには分布$ q $があるとします。
アリスは、サンプル$ a \ sim p $とボブを描画したいと考えています。
分布間の最適な結合からサンプリングすることにより、アリスとボブは$ \ pr [a = b] = 1 -d_ {tv}（p、q）$を達成できることはよく知られています。
（P、Q）$は、$ P $と$ Q $の間の総変動距離です。
アリスとボブがこの同じ問題を解決しなければならない場合\ emph {まったく通信せずに？}おそらく驚くべきことに、公共のランダム性へのアクセスで、彼らはまだ$ \ pr [a = b] \ geq \ frac {1 -d_ {tv {
}（p、q）} {1 + d_ {tv}（p、q）} \ geq 1-2d_ {tv}（p、q）$加重ミンハッシュアルゴリズムに基づく単純なプロトコルを使用します。
この境界は、[Bavarian et al。、2020]によって最悪の場合に最適であることが示されました。
この作業では、通信のない結合の問題を再訪します。
[Bavarian et al。、2020]の最適性のより単純な証拠を提供します。
最悪のケースの成功確率は改善できませんが、Gumbelサンプリングに基づく同様に単純なプロトコルがパレートの改善を提供することを示します。
加重ミンハッシュよりも$ \ pr [a = b] $の。
重要なことに、この改善は実践につながることです。
自己回復的な大手言語モデルを加速するための最近の方法である\ emphing {投機的デコード}への通信カップリングの適用を実証します[Leviathan、Kalman、Matias、ICML 2023]。
通信のないプロトコルを使用して、\ emphince {\ csd {}}スキームに対抗できることを示します。{\ csd {}}スキームは、推測に使用されるものに関係なく、固定されたランダムシードが与えられたため、出力が固定されているという望ましい特性を持っています。
言語生成タスクに関する実験では、Gumbel Samplingが加重ミンハッシュよりも優れています。
コードはhttps://github.com/majid-daliri/disdで入手できます。

要約(オリジナル)

Suppose Alice has a distribution $P$ and Bob has a distribution $Q$. Alice wants to draw a sample $a\sim P$ and Bob a sample $b \sim Q$ such that $a = b$ with as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve $\Pr[a = b] = 1 – D_{TV}(P,Q)$, where $D_{TV}(P,Q)$ is the total variation distance between $P$ and $Q$. What if Alice and Bob must solve this same problem \emph{without communicating at all?} Perhaps surprisingly, with access to public randomness, they can still achieve $\Pr[a = b] \geq \frac{1 – D_{TV}(P,Q)}{1 + D_{TV}(P,Q)} \geq 1-2D_{TV}(P,Q)$ using a simple protocol based on the Weighted MinHash algorithm. This bound was shown to be optimal in the worst-case by [Bavarian et al., 2020]. In this work, we revisit the communication-free coupling problem. We provide a simpler proof of the optimality result from [Bavarian et al., 2020]. We show that, while the worst-case success probability of Weighted MinHash cannot be improved, an equally simple protocol based on Gumbel sampling offers a Pareto improvement: for every pair of distributions $P, Q$, Gumbel sampling achieves an equal or higher value of $\Pr[a = b]$ than Weighted MinHash. Importantly, this improvement translates to practice. We demonstrate an application of communication-free coupling to \emph{speculative decoding}, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols can be used to contruct \emph{\CSD{}} schemes, which have the desirable property that their output is fixed given a fixed random seed, regardless of what drafter is used for speculation. In experiments on a language generation task, Gumbel sampling outperforms Weighted MinHash. Code is available at https://github.com/majid-daliri/DISD.

arxiv情報

著者	Majid Daliri,Christopher Musco,Ananda Theertha Suresh
発行日	2025-01-28 18:23:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Coupling without Communication and Drafter-Invariant Speculative Decoding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー