Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

要約

ニューラルネットワーク内の単一ニューロンは、多くの場合、個々の直感的に意味のある特徴を表すという点で「解釈可能」です。
ただし、多くのニューロンは $\textit{混合選択性}$ を示します。つまり、それらは複数の無関係な特徴を表します。
最近の仮説では、自然データ内の解釈可能な特徴の数は一般にニューロンの数よりも多いため、ディープネットワークの特徴は $\textit{重ね合わせ}$ で、つまり複数のニューロンによる非直交軸上で表現できる可能性があると提案されています。
特定のネットワーク内で。
したがって、個々のニューロンと一致しない活性化空間内で意味のある方向を見つけることができるはずです。
ここでは、(1) ニューロンの解釈可能性に関する人間の心理物理学的判断の大規模なデータベースに対して検証される、視覚的な解釈可能性を定量化するための自動化された方法、(2) ネットワーク活性化空間で意味のある方向を見つけるためのアプローチを提案します。
私たちはこれらの手法を活用して、一連の分析で確認および調査しながら、個々のニューロンよりも直感的に意味のある畳み込みニューラルネットワーク内の方向を発見します。
さらに、脳内の視覚神経反応に関する最近の 2 つのデータセットに同じ方法を適用したところ、私たちの結論の大部分が実際の神経データに反映されることがわかり、重ね合わせが脳によって展開されている可能性があることが示唆されました。
これはまた、もつれの解除との関連性を提供し、人工神経システムと生物学的神経システムの両方における堅牢で効率的で因数分解された表現に関する基本的な疑問を引き起こします。

要約(オリジナル)

Single neurons in neural networks are often “interpretable” in that they represent individual, intuitively meaningful features. However, many neurons exhibit $\textit{mixed selectivity}$, i.e., they represent multiple unrelated features. A recent hypothesis proposes that features in deep networks may be represented in $\textit{superposition}$, i.e., on non-orthogonal axes by multiple neurons, since the number of possible interpretable features in natural data is generally larger than the number of neurons in a given network. Accordingly, we should be able to find meaningful directions in activation space that are not aligned with individual neurons. Here, we propose (1) an automated method for quantifying visual interpretability that is validated against a large database of human psychophysics judgments of neuron interpretability, and (2) an approach for finding meaningful directions in network activation space. We leverage these methods to discover directions in convolutional neural networks that are more intuitively meaningful than individual neurons, as we confirm and investigate in a series of analyses. Moreover, we apply the same method to two recent datasets of visual neural responses in the brain and find that our conclusions largely transfer to real neural data, suggesting that superposition might be deployed by the brain. This also provides a link with disentanglement and raises fundamental questions about robust, efficient and factorized representations in both artificial and biological neural systems.

arxiv情報

著者	David Klindt,Sophia Sanborn,Francisco Acosta,Frédéric Poitevin,Nina Miolane
発行日	2023-10-17 17:41:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー