Interpreting token compositionality in LLMs: A robustness analysis

要約

大規模な言語モデル（LLM）の内部メカニズムを理解することは、信頼性、解釈可能性、および推論プロセスを強化するのに不可欠です。
LLMSが構成の言語構造をどのように処理するかを分析するために設計された方法論であるConstituent-Awareプーリング（CAP）を提示します。
構成性、機械的解釈可能性、情報理論の原則に基づいて、CAPは、さまざまなモデルレベルでの構成要素ベースのプーリングを通じて、モデルの活性化に体系的に介入します。
逆定義モデリング、HyperNym、および同義語の予測に関する実験は、構成の抽象化の処理におけるトランスの制限に関する重要な洞察を明らかにしています。
特定のレイヤーは、構成要素部品に基づいてトークンを統合されたセマンティック表現に統合しません。
断片化された情報処理が観察され、モデルサイズで強化され、より大きなモデルがこれらの介入により多くの闘いを持ち、より多くの情報分散を示すことを示唆しています。
この断片化は、トランスフォーマーのトレーニング目標と建築設計に起因する可能性が高く、体系的でまとまりのある表現を防ぎます。
私たちの調査結果は、これらの課題に対処するためにLLM設計の新しいアプローチの重要なニーズを強調して、構成セマンティクス処理とモデルの解釈性に関する現在の変圧器アーキテクチャの基本的な制限を強調しています。

要約(オリジナル)

Understanding the internal mechanisms of large language models (LLMs) is integral to enhancing their reliability, interpretability, and inference processes. We present Constituent-Aware Pooling (CAP), a methodology designed to analyse how LLMs process compositional linguistic structures. Grounded in principles of compositionality, mechanistic interpretability, and information theory, CAP systematically intervenes in model activations through constituent-based pooling at various model levels. Our experiments on inverse definition modelling, hypernym and synonym prediction reveal critical insights into transformers’ limitations in handling compositional abstractions. No specific layer integrates tokens into unified semantic representations based on their constituent parts. We observe fragmented information processing, which intensifies with model size, suggesting that larger models struggle more with these interventions and exhibit greater information dispersion. This fragmentation likely stems from transformers’ training objectives and architectural design, preventing systematic and cohesive representations. Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability, underscoring the critical need for novel approaches in LLM design to address these challenges.

arxiv情報

著者	Nura Aljaafari,Danilo S. Carvalho,André Freitas
発行日	2025-02-07 17:44:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Interpreting token compositionality in LLMs: A robustness analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー