From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers

要約

導入以来、トランスモデルはさまざまなタスクにわたって優れたパフォーマンスを実証してきました。
ただし、特にアルゴリズムタスクにおいて、長さの一般化に関しては未解決の問題がまだ残っています。
この論文では、加算や乗算などの算術アルゴリズムを学習する際の変換モデルの固有の機能を調査します。
実験と注意分析を通じて、最適な長さの一般化を達成するための多くの重要な要素を特定します。
ターゲットを絞った注意バイアスの助けを借りて、変圧器モデルが長い長さにまで一般化できることを示します。
次に、注意バイアスキャリブレーション (ABC) を導入します。これは、モデルが適切な注意バイアスを自動的に学習できるようにするキャリブレーションステージであり、これを相対位置エンコーディングのメカニズムにリンクします。
ABC を使用すると、変換モデルが特定の算術タスクで前例のない完全な長さの一般化を達成できることを示します。

要約(オリジナル)

Since its introduction, the transformer model has demonstrated outstanding performance across various tasks. However, there are still unresolved issues regarding length generalization, particularly in algorithmic tasks. In this paper, we investigate the inherent capabilities of transformer models in learning arithmetic algorithms, such as addition and multiplication. Through experiments and attention analysis, we identify a number of crucial factors for achieving optimal length generalization. We show that transformer models are able to generalize to long lengths with the help of targeted attention biasing. We then introduce Attention Bias Calibration (ABC), a calibration stage that enables the model to automatically learn the proper attention biases, which we link to mechanisms in relative position encoding. We demonstrate that using ABC, the transformer model can achieve unprecedented perfect length generalization on certain arithmetic tasks.

arxiv情報

著者	Shaoxiong Duan,Yining Shi
発行日	2023-10-18 14:10:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー