Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

要約

すべての自然言語は階層構造になっています。
人間の場合、この構造的制限は神経学的にコード化されています。つまり、2 つの文法が同一の語彙で提示されると、言語処理を担当する脳領域は階層文法にのみ敏感になります。
私たちは、大規模言語モデル (LLM) を使用して、このような機能的に異なる階層的な処理領域が大規模な言語配布への曝露のみから発生する可能性があるかどうかを調査します。
英語、イタリア語、日本語、またはノンス単語を使用して入力を生成し、階層または線形/位置規則に準拠するように基礎となる文法を変更します。
これらの文法を使用すると、まず、言語モデルが階層構造の入力と線形構造の入力に対して異なる動作を示すことが観察されます。
次に、階層文法の処理を担当するコンポーネントが、線形文法を処理するコンポーネントとは異なることがわかります。
私たちはアブレーション実験でこれを因果的に検証します。
最後に、階層選択コンポーネントがノンス文法でもアクティブであることがわかります。
これは、階層の感受性が意味にも、分布内の入力にも結びついていないことを示唆しています。

要約(オリジナル)

All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we causally verify this in ablation experiments. Finally, we observe that hierarchy-selective components are also active on nonce grammars; this suggests that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs.

arxiv情報

著者	Aruna Sankaranarayanan,Dylan Hadfield-Menell,Aaron Mueller
発行日	2025-01-15 06:34:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー