Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video

要約

ロバストなツールと一般に利用可能な事前訓練されたモデルは、言語モデルのメカニズム的解釈可能性における最近の進歩を後押ししてきた。しかし、ビジョンの力学的解釈可能性における同様の進歩は、アクセス可能なフレームワークと事前に訓練された重みの欠如によって妨げられてきた。Prisma（コードベースへのアクセスはこちら：https://github.com/Prisma-Multimodal/ViT-Prisma）は、視覚の力学的解釈可能性研究を加速するために設計されたオープンソースのフレームワークであり、75以上の視覚と映像の変換器、スパースオートエンコーダ（SAE）、トランスコーダ、クロスコーダのトレーニングのサポート、80以上の事前訓練されたSAE重み、活性化キャッシュ、回路解析ツール、可視化ツール、教育リソースにアクセスするための統一されたツールキットを提供する。私たちの分析により、効果的な視覚SAEは言語SAEよりも実質的に低いスパースパターンを示す可能性があることや、SAE再構成によりモデル損失が減少する場合があることなど、驚くべき発見が明らかになりました。Prismaは、ビジョンモデルの内部を理解するための新しい研究の方向性を示すと同時に、この新しい分野への参入障壁を低くする。

要約(オリジナル)

Robust tooling and publicly available pre-trained models have helped drive recent advances in mechanistic interpretability for language models. However, similar progress in vision mechanistic interpretability has been hindered by the lack of accessible frameworks and pre-trained weights. We present Prisma (Access the codebase here: https://github.com/Prisma-Multimodal/ViT-Prisma), an open-source framework designed to accelerate vision mechanistic interpretability research, providing a unified toolkit for accessing 75+ vision and video transformers; support for sparse autoencoder (SAE), transcoder, and crosscoder training; a suite of 80+ pre-trained SAE weights; activation caching, circuit analysis tools, and visualization tools; and educational resources. Our analysis reveals surprising findings, including that effective vision SAEs can exhibit substantially lower sparsity patterns than language SAEs, and that in some instances, SAE reconstructions can decrease model loss. Prisma enables new research directions for understanding vision model internals while lowering barriers to entry in this emerging field.

arxiv情報

著者	Sonia Joseph,Praneet Suresh,Lorenz Hufe,Edward Stevinson,Robert Graham,Yash Vadi,Danilo Bzdok,Sebastian Lapuschkin,Lee Sharkey,Blake Aaron Richards
発行日	2025-06-03 06:43:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー