Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

要約

トランスフォーマーはさまざまなコンピュータービジョンアプリケーションで急速に人気が高まっていますが、その内部メカニズムの事後的な説明はほとんど解明されていません。
ビジョントランスフォーマーは、画像領域を変換されたトークンとして表し、アテンションウェイトを介してそれらを統合することにより、視覚情報を抽出します。
ただし、既存の事後説明手法では、これらの注意の重みを考慮するだけで、変換されたトークンからの重要な情報が無視されているため、モデルの予測の背後にある理論的根拠を正確に説明できません。
トークン変換の影響を解釈に組み込むために、我々は導入したトークン変換効果の測定を利用した新しい事後説明手法である TokenTM を提案します。
具体的には、変換前後のトークンの長さの変化とその方向の相関を測定することで、トークン変換の効果を定量化します。
さらに、すべてのレイヤーにわたってアテンションの重みとトークン変換効果の両方を統合するための初期化ルールと集約ルールを開発し、モデル全体にわたる全体的なトークンの寄与を捕捉します。
セグメンテーションおよび摂動テストの実験結果は、最先端の Vision Transformer 説明手法と比較して、私たちが提案する TokenTM の優位性を実証しています。

要約(オリジナル)

While Transformers have rapidly gained popularity in various computer vision applications, post-hoc explanations of their internal mechanisms remain largely unexplored. Vision Transformers extract visual information by representing image regions as transformed tokens and integrating them via attention weights. However, existing post-hoc explanation methods merely consider these attention weights, neglecting crucial information from the transformed tokens, which fails to accurately illustrate the rationales behind the models’ predictions. To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects. Specifically, we quantify token transformation effects by measuring changes in token lengths and correlations in their directions pre- and post-transformation. Moreover, we develop initialization and aggregation rules to integrate both attention weights and token transformation effects across all layers, capturing holistic token contributions throughout the model. Experimental results on segmentation and perturbation tests demonstrate the superiority of our proposed TokenTM compared to state-of-the-art Vision Transformer explanation methods.

arxiv情報

著者	Junyi Wu,Bin Duan,Weitai Kang,Hao Tang,Yan Yan
発行日	2024-03-21 16:52:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー