On the Interpretability of Attention Networks

要約

タイトル：Attention Networks の解釈可能性について

要約：
– Attention Mechanism は、いくつかの成功した深層学習アーキテクチャの中心的な要素の1つであり、「出力は入力の小さな（しかし不明な）セグメントにしか依存しない」という1つのキーのアイデアに基づいている。
– 画像のキャプションや言語翻訳などのいくつかの実用的なアプリケーションでは、これがほとんど真実である。
– Attention Mechanism を持つトレーニングされたモデルでは、出力に責任がある入力のセグメントをエンコードする中間モジュールの出力が、ネットワークの「推論」をのぞき見る方法として使用されることがしばしばある。
– この考えを、選択依存分類（SDC）と称する分類問題のバリアントについてより詳しく説明する。
– この設定下で、Attention Model が精度は高いが解釈できない（不正解であることができる）さまざまなエラーモードを示し、そのようなモデルがトレーニング結果として発生することを示す。
– この振る舞いを強調するさまざまな状況と、その振る舞いを軽減することができる場合について説明する。
– 最後に、SDC タスクに対する解釈可能性の目的に基づく定義を使用していくつかの Attention Model 学習アルゴリズムを評価し、疎などを促進するアルゴリズムが解釈可能性を向上させるのに役立つことを示す。

要約(オリジナル)

Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ”The output depends only on a small (but unknown) segment of the input.” In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.

arxiv情報

著者	Lakshmi Narayan Pandey,Rahul Vashisht,Harish G. Ramaswamy
発行日	2023-04-09 17:20:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

On the Interpretability of Attention Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー