Code Pretraining Improves Entity Tracking Abilities of Language Models

要約

最近の研究では、コード上で言語モデルを事前トレーニングすると、自然言語で表現された談話エンティティの状態変化を追跡するモデルの能力が向上するという間接的な証拠が得られました。
この研究では、エンティティ追跡パフォーマンスに関して言語モデルのペアを比較することにより、この主張を体系的にテストします。
重要なのは、このペアは、基本モデルと、追加のコードデータを使用してこれらの基本モデル上でトレーニングされたモデルで構成されているということです。
この分析を拡張して、もう 1 つの高度に構造化されたデータタイプである数学トレーニングと、モデルの使いやすさを向上させるための重要なステップであるアライメントチューニングの効果をさらに調べます。
大量のコードで追加トレーニングされたモデルが基本モデルよりも優れているという明確な証拠が見つかりました。
一方で、さまざまなモデルファミリーにわたる追加の数学トレーニングやアライメント調整による一貫した利点は見つかりませんでした。

要約(オリジナル)

Recent work has provided indirect evidence that pretraining language models on code improves the ability of models to track state changes of discourse entities expressed in natural language. In this work, we systematically test this claim by comparing pairs of language models on their entity tracking performance. Critically, the pairs consist of base models and models trained on top of these base models with additional code data. We extend this analysis to additionally examine the effect of math training, another highly structured data type, and alignment tuning, an important step for enhancing the usability of models. We find clear evidence that models additionally trained on large amounts of code outperform the base models. On the other hand, we find no consistent benefit of additional math training or alignment tuning across various model families.

arxiv情報

著者	Najoung Kim,Sebastian Schuster,Shubham Toshniwal
発行日	2024-05-31 17:56:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Code Pretraining Improves Entity Tracking Abilities of Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー