How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure

要約

言語モデルは通常、特定のコンテキストにおける特定の単語の分布の予測の成功度に基づいて評価されます。
しかし、言語知識は文脈間の関係もエンコードし、単語の分布間の推論を可能にします。
私たちは、引数構造の領域に焦点を当てて、事前トレーニングされた Transformer ベースの大規模言語モデル (LLM) がそのような関係をどの程度表現しているかを調査します。
LLM は、意味的に組織化された構造を利用することで、事前トレーニング中に見られた関連するコンテキスト (動詞スプレーの能動目的語と受動目的語など) 間での新規名詞引数の分布を一般化する際にうまく機能することがわかりました。
単語埋め込み用の埋め込みスペース。
ただし、LLM は、事前トレーニング中には観察されていないが、より抽象的だが十分に証明されている構造的一般化 (たとえば、任意の動詞の能動目的語と受動主語の間) をインスタンス化する、関連するコンテキスト間の一般化には失敗します。
代わりに、この場合、LLM は線形順序に基づいて一般化するバイアスを示します。
この発見は、現在のモデルの制限を示しており、トレーニングがデータ集約的である理由を示しています。ここで報告されているものは、https://github.com/clay-lab/structural-alternations で入手できます。

要約(オリジナル)

Language models are typically evaluated on their success at predicting the distribution of specific words in specific contexts. Yet linguistic knowledge also encodes relationships between contexts, allowing inferences between word distributions. We investigate the degree to which pre-trained Transformer-based large language models (LLMs) represent such relationships, focusing on the domain of argument structure. We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts that were seen during pre-training (e.g., the active object and passive subject of the verb spray), succeeding by making use of the semantically-organized structure of the embedding space for word embeddings. However, LLMs fail at generalizations between related contexts that have not been observed during pre-training, but which instantiate more abstract, but well-attested structural generalizations (e.g., between the active object and passive subject of an arbitrary verb). Instead, in this case, LLMs show a bias to generalize based on linear order. This finding points to a limitation with current models and points to a reason for which their training is data-intensive.s reported here are available at https://github.com/clay-lab/structural-alternations.

arxiv情報

著者	Michael Wilson,Jackson Petty,Robert Frank
発行日	2023-11-08 18:58:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー