How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

要約

大規模言語モデル (LLM) は、計画と推論を必要とするタスクで驚くべきパフォーマンスを示しました。
これを動機として、私たちは、複雑な論理的推論を実行するネットワークの能力を支える内部メカニズムを調査します。
まず、ネットワークのトレーニングと評価のための具体的なテストベッドとして機能する合成命題論理問題を構築します。
重要なのは、この問題を解決するには重要な計画が必要ですが、小型の変圧器を訓練して完全な精度を達成することは可能です。
私たちはセットアップに基づいて、ゼロからトレーニングされた 3 層トランスがこの問題をどのように解決するかを正確に理解することを追求します。
私たちは、目的のロジックを実装するためにアテンションブロック間の連携を必要とする、ネットワーク内の特定の「計画」回路と「推論」回路を特定することができます。
調査結果を拡張するために、さらに大きなモデルであるミストラル 7B を研究します。
アクティベーションパッチを使用して、ロジックの問題を解決する上で重要な内部コンポーネントを特徴付けます。
全体として、私たちの研究は小型および大型の変圧器の新しい側面を系統的に明らかにし、変圧器がどのように計画され、推論されるかについての研究を続けています。

要約(オリジナル)

Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network’s ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve, but we can train a small transformer to achieve perfect accuracy. Building on our set-up, we then pursue an understanding of precisely how a three-layer transformer, trained from scratch, solves this problem. We are able to identify certain ‘planning’ and ‘reasoning’ circuits in the network that necessitate cooperation between the attention blocks to implement the desired logic. To expand our findings, we then study a larger model, Mistral 7B. Using activation patching, we characterize internal components that are critical in solving our logic problem. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason.

arxiv情報

著者	Guan Zhe Hong,Nishanth Dikkala,Enming Luo,Cyrus Rashtchian,Xin Wang,Rina Panigrahy
発行日	2024-11-07 03:50:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー