Mechanistic understanding and validation of large AI models with SemanticLens

要約

各コンポーネントの役割と依存関係がよく理解されている飛行機のような人間工学によるシステムとは異なり、AI モデルの内部動作は依然としてほとんど不透明なため、検証可能性が妨げられ、信頼が損なわれます。
この論文では、コンポーネント (個々のニューロンなど) によってエンコードされた隠れた知識を、CLIP などの基礎モデルの意味論的に構造化されたマルチモーダル空間にマッピングする、ニューラルネットワークの普遍的な説明手法である SemanticLens を紹介します。
この空間では、(i) 特定の概念をコード化するニューロンを識別するためのテキスト検索、(ii) モデル表現の体系的な分析と比較、(iii) ニューロンの自動ラベル付けとその機能的役割の説明、(iv) などのユニークな操作が可能になります。
) 要件に照らして意思決定を検証するための監査。
完全にスケーラブルで人間の入力なしで動作する SemanticLens は、デバッグと検証、モデルの知識の要約、推論を期待と一致させる (黒色腫分類における ABCDE ルールの順守など)、偽の相関とその相関関係に関連付けられたコンポーネントの検出に効果的であることが示されています。
関連するトレーニングデータ。
提案されたアプローチは、コンポーネントレベルの理解と検証を可能にすることで、AI モデルと従来の設計システムの間の「信頼ギャップ」を埋めるのに役立ちます。
SemanticLens のコードは https://github.com/jim-berend/semanticlens で、デモは https://semanticlens.hhi-research-insights.eu で提供されています。

要約(オリジナル)

Unlike human-engineered systems such as aeroplanes, where each component’s role and dependencies are well understood, the inner workings of AI models remain largely opaque, hindering verifiability and undermining trust. This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components (e.g., individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. In this space, unique operations become possible, including (i) textual search to identify neurons encoding specific concepts, (ii) systematic analysis and comparison of model representations, (iii) automated labelling of neurons and explanation of their functional roles, and (iv) audits to validate decision-making against requirements. Fully scalable and operating without human input, SemanticLens is shown to be effective for debugging and validation, summarizing model knowledge, aligning reasoning with expectations (e.g., adherence to the ABCDE-rule in melanoma classification), and detecting components tied to spurious correlations and their associated training data. By enabling component-level understanding and validation, the proposed approach helps bridge the ‘trust gap’ between AI models and traditional engineered systems. We provide code for SemanticLens on https://github.com/jim-berend/semanticlens and a demo on https://semanticlens.hhi-research-insights.eu.

arxiv情報

著者	Maximilian Dreyer,Jim Berend,Tobias Labarta,Johanna Vielhaben,Thomas Wiegand,Sebastian Lapuschkin,Wojciech Samek
発行日	2025-01-09 17:47:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mechanistic understanding and validation of large AI models with SemanticLens

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー