The Computational Complexity of Circuit Discovery for Inner Interpretability

要約

機械学習、認知/脳科学、および社会の解釈可能性の実現可能性に基づいて、機械学習、認知/脳科学、および社会におけるニューラルネットワークの多くのアプリケーションが提案されています。
これには、実行可能なアルゴリズムオプションの経験的および理論的な調査が必要です。
ヒューリスティックの設計とテストの進歩にもかかわらず、解決するために展開されている問題の複雑さの特性を理解していないときに、それらのスケーラビリティと忠実さについて懸念があります。
これに対処するために、古典的およびパラメーター化された計算の複雑さ理論で回路発見を研究します。（1）説明、説明、予測、および制御のためのアフォーダンスの観点から回路を見つけることについての概念的な足場について説明します。
（2）機械的説明のための包括的な一連のクエリセットを正式化し、それらの分析のための正式なフレームワークを提案します。
（3）それを使用して、多くのクエリバリアントの複雑さと、多層パーセプトロンに対する実際的な関心の緩和を解決します。
私たちの調査結果は、挑戦的な複雑さの風景を明らかにしています。
多くのクエリは扱いにくく、モデル/回路の特徴に比べて固定パラメーターが扱いにくいままであり、加法、乗法、および確率的近似スキームの下では近似可能です。
この景観をナビゲートするために、私たちは、これらの困難な問題のいくつかに、よりよく理解されたヒューリスティックに取り組むための変革が存在し、有用なアフォーダンスを保持するより控えめなクエリの扱いやすさまたは固定パラメーターの牽引可能性を証明することを証明します。
このフレームワークにより、解釈可能性のクエリの範囲と制限を理解し、実行可能なオプションを検討し、既存のアーキテクチャと将来のアーキテクチャに関するリソースの要求を比較することができます。

要約(オリジナル)

Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries for mechanistic explanation, and propose a formal framework for their analysis; (3) we use it to settle the complexity of many query variants and relaxations of practical interest on multi-layer perceptrons. Our findings reveal a challenging complexity landscape. Many queries are intractable, remain fixed-parameter intractable relative to model/circuit features, and inapproximable under additive, multiplicative, and probabilistic approximation schemes. To navigate this landscape, we prove there exist transformations to tackle some of these hard problems with better-understood heuristics, and prove the tractability or fixed-parameter tractability of more modest queries which retain useful affordances. This framework allows us to understand the scope and limits of interpretability queries, explore viable options, and compare their resource demands on existing and future architectures.

arxiv情報

著者	Federico Adolfi,Martina G. Vilas,Todd Wareham
発行日	2025-04-01 14:16:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Computational Complexity of Circuit Discovery for Inner Interpretability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー