Token-level Ensembling of Models with Different Vocabularies

要約

Model Ensemblingは、2つ以上のモデルの予測される分布を組み合わせる手法であり、多くの場合、堅牢性とパフォーマンスの改善につながります。
テキスト生成のアンサンリングの場合、次のトークンの確率分布は、個々のモデルの分布の加重合計から導き出されます。
これには、多くのオープンソースモデルが異なる語彙を持っているため、アンサンブルの適用性を制限する、同じサブワードの語彙を共有するために、基礎となるモデルが必要です。
研究環境では、語彙の実験またはアップグレードにより、複数の語彙サイズが導入される場合があります。
このホワイトペーパーでは、追加のパラメーターを学習したり、基礎となるモデルを変更することなく、異なる語彙を持つモデルをアンサンブルすることを可能にする推論時間のみのアルゴリズムを提案します。
代わりに、アルゴリズムは、アンサンブルモデル\ textIT {areg}によって生成されたトークンを表面形式で保証します。
この手法は、従来のエンコーダーデコーダーモデルとデコーダーのみのLLMSの組み合わせに適用し、機械翻訳で評価します。
以前はトークンレベルのアンサンミングができなかったモデルペアへの拡張に加えて、私たちのアルゴリズムは、いずれかのモデルよりも翻訳パフォーマンスを個別に改善することがよくあります。

要約(オリジナル)

Model ensembling is a technique to combine the predicted distributions of two or more models, often leading to improved robustness and performance. For ensembling in text generation, the next token’s probability distribution is derived from a weighted sum of the distributions of each individual model. This requires the underlying models to share the same subword vocabulary, limiting the applicability of ensembling, since many open-sourced models have distinct vocabularies. In research settings, experimentation or upgrades to vocabularies may introduce multiple vocabulary sizes. This paper proposes an inference-time only algorithm that allows for ensembling models with different vocabularies, without the need to learn additional parameters or alter the underlying models. Instead, the algorithm ensures that tokens generated by the ensembled models \textit{agree} in their surface form. We apply this technique to combinations of traditional encoder-decoder models and decoder-only LLMs and evaluate on machine translation. In addition to expanding to model pairs that were previously incapable of token-level ensembling, our algorithm frequently improves translation performance over either model individually.

arxiv情報

著者	Rachel Wicks,Kartik Ravisankar,Xinchen Yang,Philipp Koehn,Matt Post
発行日	2025-02-28 17:41:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Token-level Ensembling of Models with Different Vocabularies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー