Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures

要約

言語モデル (LM) の認知的妥当性に関する研究は、これまでのところ、読書時間、注視時間、N400/P600 EEG 信号などの心理言語応答変数のモデリングに主に集中しており、Mahowald らの研究の次元はほとんど無視されています。
(2023) 形式的および機能的な言語能力、および発達上の妥当性として説明されています。
私たちは、BabyLM 事前トレーニングコーパスの厳密なバージョンでさまざまなサイズの一連の GPT に似た言語モデルをトレーニングし、チャレンジタスク (BLiMP、GLUE、MSGS) と追加の読解時間予測タスクを評価することで、このギャップに対処します。
3 つの課題タスクすべてで、LM サイズとパフォーマンスの間に正の相関関係があり、各タスクでモデルの幅と深さの好みが異なることがわかりました。
対照的に、LM サプライズを予測子として使用した線形混合効果モデルの LM サイズと読み取り時間フィットの間に負の相関が見つかり、2 番目に小さい LM がサプライズなしのベースラインモデルに対して最大の対数尤度の減少を達成しました。
これは、処理労力と言語能力のモデリングには、発達上もっともらしいコーパスで GPT のような LM をトレーニングするのとは異なるアプローチが必要である可能性があることを示唆しています。

要約(オリジナル)

Research on the cognitive plausibility of language models (LMs) has so far mostly concentrated on modelling psycholinguistic response variables such as reading times, gaze durations and N400/P600 EEG signals, while mostly leaving out the dimension of what Mahowald et al. (2023) described as formal and functional linguistic competence, and developmental plausibility. We address this gap by training a series of GPT-like language models of different sizes on the strict version of the BabyLM pretraining corpus, evaluating on the challenge tasks (BLiMP, GLUE, MSGS) and an additional reading time prediction task. We find a positive correlation between LM size and performance on all three challenge tasks, with different preferences for model width and depth in each of the tasks. In contrast, a negative correlation was found between LM size and reading time fit of linear mixed-effects models using LM surprisal as a predictor, with the second-smallest LM achieving the largest log-likelihood reduction over a baseline model without surprisal. This suggests that modelling processing effort and linguistic competence may require an approach different from training GPT-like LMs on a developmentally plausible corpus.

arxiv情報

著者	Julius Steuer,Marius Mosbach,Dietrich Klakow
発行日	2023-11-08 09:26:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー