Lost in Transmission: When and Why LLMs Fail to Reason Globally

要約

多くの成功にもかかわらず、トランスベースの大手言語モデル（LLMS）は、入力の大部分にわたって複雑な推論を必要とするタスクと格闘し続けています。
これらの障害は、LLMS内の情報の正確な流れの容量制限のために発生すると主張します。
この問題を正式にするために、Bounded Anterness Prefix Oracle（BAPO）モデルを紹介します。これは、LLMSでの内部通信のメカニズムである注意ヘッドの帯域幅の制約をモデル化する新しい計算フレームワークです。
グラフの到達可能性のようないくつかの重要な推論の問題には、BAPOSが解決するための高い通信帯域幅が必要であることを示します。
これらの問題をbapo-hardと呼びます。
私たちの実験は、私たちの理論的予測を裏付けています：GPT-4O、Claude、およびGeminiはBapo-Easyタスクで成功し、比較的小さなBapoハードタスクでも失敗します。
Baposはまた、Chain of Aching（Cot）の別の利点を明らかにします。Cotを使用してタスクを壊すことで、Bapo-Hardの問題をBapo-Eesyの問題に変えることができることを証明します。
私たちの結果は、主要なLLM障害の原則的な説明を提供し、帯域幅の制限を緩和するアーキテクチャと推論方法の方向を示唆しています。

要約(オリジナル)

Despite their many successes, transformer-based large language models (LLMs) continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4o, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove that breaking down a task using CoT can turn any BAPO-hard problem into a BAPO-easy one. Our results offer principled explanations for key LLM failures and suggest directions for architectures and inference methods that mitigate bandwidth limits.

arxiv情報

著者	Tobias Schnabel,Kiran Tomlinson,Adith Swaminathan,Jennifer Neville
発行日	2025-05-19 16:46:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lost in Transmission: When and Why LLMs Fail to Reason Globally

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー