Regress, Don’t Guess — A Regression-like Loss on Number Tokens for Language Models

要約

言語モデルはテキスト生成において優れた能力を持っていますが、数値を出力するための自然な帰納的バイアスが欠けているため、量に関する推論、特に算術を伴うタスクでは困難を伴います。
これは、テキストと数値データの組み合わせが豊富な科学データセットに特に関連します。
基本的な制限の 1 つは、CE 損失の性質です。CE 損失は名目 (カテゴリー) スケールを想定しているため、生成された数値トークン間の近接性を伝えることができません。
救済策として、ここでは数値トークンの損失に関する 2 つのバージョンを紹介します。
1 つ目は、グラウンドトゥルーストークン値と予測されたクラス確率の加重合計との間の $L_p$ 損失に基づいています。
2 番目の損失により、予測された出力確率の分布とグランドトゥルース分布の間の Wasserstein-1 距離が最小化されます。
これらの回帰のような損失は、任意の言語モデルに簡単に追加でき、トレーニング中の CE 目標を拡張できます。
言語モデルにおける数値表現を改善するために、数学データセットで提案されたスキームを既存のトークン化、エンコード、およびデコードスキームと比較します。
私たちの結果は、標準的な T5 モデルに提案された損失スキームを装備すると、数値精度が大幅に向上することを明らかにしました。

要約(オリジナル)

While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving reasoning over quantities, especially arithmetics. This has particular relevance in scientific datasets where combinations of text and numerical data are abundant. One fundamental limitation is the nature of the CE loss, which assumes a nominal (categorical) scale and thus cannot convey proximity between generated number tokens. As a remedy, we here present two versions of a number token loss. The first is based on an $L_p$ loss between the ground truth token value and the weighted sum of the predicted class probabilities. The second loss minimizes the Wasserstein-1 distance between the distribution of the predicted output probabilities and the ground truth distribution. These regression-like losses can easily be added to any language model and extend the CE objective during training. We compare the proposed schemes on a mathematics dataset against existing tokenization, encoding, and decoding schemes for improving number representation in language models. Our results reveal a significant improvement in numerical accuracy when equipping a standard T5 model with the proposed loss schemes.

arxiv情報

著者	Jonas Zausinger,Lars Pennig,Kacper Chlodny,Vincent Limbach,Anna Ketteler,Thorben Prein,Vishwa Mohan Singh,Michael Morris Danziger,Jannis Born
発行日	2024-11-04 13:43:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Regress, Don’t Guess — A Regression-like Loss on Number Tokens for Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー