Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video

要約

堅牢なツールと公開されている事前に訓練されたモデルは、言語モデルの機械的解釈可能性の最近の進歩を促進するのに役立ちました。
ただし、アクセス可能なフレームワークの欠如と事前に訓練された重みの欠如により、視力の機械的解釈可能性の同様の進歩が妨げられています。
Prisma（こちらからコードベースにアクセス：https：//github.com/prisma-multimodal/vit-prisma）を提示します。これは、視力の機械的解釈可能性の研究を加速するように設計されたオープンソースフレームワークであり、75+ビジョンとビデオ変圧器にアクセスするための統一されたツールキットを提供します。
スパースオートエンコーダー（SAE）、トランスコダー、クロスコダートレーニングのサポート。
80以上の事前に訓練されたSAEウェイトのスイート。
アクティベーションキャッシング、回路分析ツール、視覚化ツール。
および教育リソース。
私たちの分析は、効果的なビジョンSAEが言語SAEよりも大幅に低いスパースパターンを示すことができ、場合によってはSAEの再構成がモデルの損失を減らすことができることを含む、驚くべき発見を明らかにしています。
Prismaは、この新興分野への参入の障壁を下げながら、ビジョンモデルの内部を理解するための新しい研究の方向性を可能にします。

要約(オリジナル)

Robust tooling and publicly available pre-trained models have helped drive recent advances in mechanistic interpretability for language models. However, similar progress in vision mechanistic interpretability has been hindered by the lack of accessible frameworks and pre-trained weights. We present Prisma (Access the codebase here: https://github.com/Prisma-Multimodal/ViT-Prisma), an open-source framework designed to accelerate vision mechanistic interpretability research, providing a unified toolkit for accessing 75+ vision and video transformers; support for sparse autoencoder (SAE), transcoder, and crosscoder training; a suite of 80+ pre-trained SAE weights; activation caching, circuit analysis tools, and visualization tools; and educational resources. Our analysis reveals surprising findings, including that effective vision SAEs can exhibit substantially lower sparsity patterns than language SAEs, and that in some instances, SAE reconstructions can decrease model loss. Prisma enables new research directions for understanding vision model internals while lowering barriers to entry in this emerging field.

arxiv情報

著者	Sonia Joseph,Praneet Suresh,Lorenz Hufe,Edward Stevinson,Robert Graham,Yash Vadi,Danilo Bzdok,Sebastian Lapuschkin,Lee Sharkey,Blake Aaron Richards
発行日	2025-06-02 05:24:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー