A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

要約

画像分類を解釈可能にするための Transformers の新しい使用法を紹介します。
完全に接続された最後の層がクラス情報を組み込んで予測を行うまで待つ主流の分類子とは異なり、私たちは、各クラスに画像内で自分自身を検索するよう求めるプロアクティブなアプローチを調査します。
このアイデアは、DEtection TRansformer (DETR) からインスピレーションを得た Transformer エンコーダ/デコーダによって実現されています。
「クラス固有の」クエリ (各クラスに 1 つ) をデコーダへの入力として学習し、各クラスがクロスアテンションを通じて画像内のパターンの位置を特定できるようにします。
私たちはこのアプローチを INterpretable TRansformer (INTR) と名付けます。これは実装が非常に簡単で、いくつかの魅力的な特性を示します。
私たちは、INTR が本質的に各クラスに独特の出席を奨励していることを示しています。
したがって、クロスアテンションの重みは、予測の忠実な解釈を提供します。
興味深いことに、INTR は「マルチヘッド」クロスアテンションを介してクラスのさまざまな「属性」を識別できるため、きめの細かい分類と分析に特に適しています。これを 8 つのデータセットで実証します。
私たちのコードと事前トレーニングされたモデルは、Imageomics Institute GitHub サイト (https://github.com/Imageomics/INTR) で公開されています。

要約(オリジナル)

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn ‘class-specific’ queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via ‘multi-head’ cross-attention, INTR could identify different ‘attributes’ of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained models are publicly accessible at the Imageomics Institute GitHub site: https://github.com/Imageomics/INTR.

arxiv情報

著者	Dipanjyoti Paul,Arpita Chowdhury,Xinqi Xiong,Feng-Ju Chang,David Carlyn,Samuel Stevens,Kaiya L. Provost,Anuj Karpatne,Bryan Carstens,Daniel Rubenstein,Charles Stewart,Tanya Berger-Wolf,Yu Su,Wei-Lun Chao
発行日	2024-06-14 17:28:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー