A Multimodal Automated Interpretability Agent

要約

この文書では、マルチモーダル自動解釈エージェントである MAIA について説明します。
MAIA は、ニューラルモデルを使用して、特徴の解釈や故障モードの検出などのニューラルモデル理解タスクを自動化するシステムです。
事前トレーニングされた視覚言語モデルに、他のモデルのサブコンポーネントの動作を説明するための反復実験をサポートする一連のツールを装備します。
これらには、人間の解釈可能性の研究者が一般的に使用するツールが含まれます。これは、入力の合成と編集、実世界のデータセットから最大限に活性化するサンプルの計算、実験結果の要約と説明に使用されます。
MAIA によって提案された解釈可能性実験は、システムの動作を記述および説明するためのこれらのツールを構成します。
MAIA のコンピュータビジョンモデルへの適用を評価します。
まず、学習された画像表現の（ニューロンレベルの）特徴を記述する MAIA の能力を特徴付けます。
MAIA は、いくつかのトレーニング済みモデルと、ペアになったグラウンドトゥルース記述を含む合成視覚ニューロンの新しいデータセットにわたって、専門の人体実験者によって生成されたものと同等の記述を生成します。
次に、MAIA が 2 つの追加の解釈可能性タスク、つまり偽の特徴に対する感度を下げることと、誤って分類される可能性のある入力を自動的に識別することを支援できることを示します。

要約(オリジナル)

This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery. It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior. We evaluate applications of MAIA to computer vision models. We first characterize MAIA’s ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features, and automatically identifying inputs likely to be mis-classified.

arxiv情報

著者	Tamar Rott Shaham,Sarah Schwettmann,Franklin Wang,Achyuta Rajaram,Evan Hernandez,Jacob Andreas,Antonio Torralba
発行日	2024-04-22 17:55:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Multimodal Automated Interpretability Agent

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー