Multi-Token Prediction Needs Registers

要約

マルチトークンの予測は、言語モデルの事前トレーニングを改善するための有望な目的として浮上していますが、その利点は微調整などの他の設定に一貫して一般化されていません。
このホワイトペーパーでは、Mutorを提案します。これは、学習可能な登録トークンを入力シーケンスにインターリーズするマルチトークン予測へのシンプルで効果的なアプローチを提案します。
既存の方法と比較して、Mutorはいくつかの重要な利点を提供します。それは、無視できる数の追加パラメーターのみを導入し、アーキテクチャの変更を必要としません – 既製の前ored言語モデルとの互換性を抑制することは、次の刻まれた前削除の目的と整合したままであり、特に監督された微調整に適しています。
さらに、自然にスケーラブルな予測視野をサポートします。
言語ドメインとビジョンドメインの両方での挑戦的な生成タスクについて、監視された微調整、パラメーター効率の高い微調整（PEFT）、および事前化を含む、さまざまなユースケースにわたるMOTORの有効性と汎用性を実証します。
私たちのコードは、https：//github.com/nasosger/mutorで入手できます。

要約(オリジナル)

Multi-token prediction has emerged as a promising objective for improving language model pretraining, but its benefits have not consistently generalized to other settings such as fine-tuning. In this paper, we propose MuToR, a simple and effective approach to multi-token prediction that interleaves learnable register tokens into the input sequence, each tasked with predicting future targets. Compared to existing methods, MuToR offers several key advantages: it introduces only a negligible number of additional parameters, requires no architectural changes–ensuring compatibility with off-the-shelf pretrained language models–and remains aligned with the next-token pretraining objective, making it especially well-suited for supervised fine-tuning. Moreover, it naturally supports scalable prediction horizons. We demonstrate the effectiveness and versatility of MuToR across a range of use cases, including supervised fine-tuning, parameter-efficient fine-tuning (PEFT), and pretraining, on challenging generative tasks in both language and vision domains. Our code will be available at: https://github.com/nasosger/MuToR.

arxiv情報

著者	Anastasios Gerontopoulos,Spyros Gidaris,Nikos Komodakis
発行日	2025-05-15 17:25:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Token Prediction Needs Registers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー