MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning

要約

ニューロンのラベリングは、特定のニューロンを活性化する特定のパターンに対する特定のニューロンの動作と反応を視覚化するアプローチです。
ニューロンラベリングは、ディープニューラルネットワーク内の特定のニューロンによってキャプチャされた特徴に関する情報を抽出します。そのうちの 1 つは、エンコーダーデコーダー画像キャプションアプローチを使用します。
使用されるエンコーダーは事前トレーニングされた CNN ベースのモデルで、デコーダーはテキスト生成用の RNN ベースのモデルです。
以前の研究、つまり MILAN (ニューロンの相互情報誘導言語注釈) では、エンコーダーで修正された Show, Attend, and Tell (SAT) モデルを使用し、デコーダーでバダナウアテンションを追加した LSTM を使用してニューロンの動作を視覚化することを試みました。
MILAN は、短いシーケンスのニューロンキャプションでは優れた結果を示しますが、長いシーケンスのニューロンキャプションでは大きな結果を示さないため、この作業では、異なる種類のアテンションメカニズムを利用し、追加の追加を行うことで、MILAN のパフォーマンスをさらに向上させたいと考えています。
複数の注意メカニズムの利点をすべて組み合わせるために、複数の注意が 1 つにまとめられます。
複合データセットを使用すると、提案したモデルでより高い BLEU スコアと F1 スコアが得られ、それぞれ 17.742 と 0.4811 を達成しました。
モデルがピークに収束するある時点で、モデルは 21.2262 の BLEU と 0.4870 の BERTScore F1 スコアを取得しました。

要約(オリジナル)

Neuron labeling is an approach to visualize the behaviour and respond of a certain neuron to a certain pattern that activates the neuron. Neuron labeling extract information about the features captured by certain neurons in a deep neural network, one of which uses the encoder-decoder image captioning approach. The encoder used can be a pretrained CNN-based model and the decoder is an RNN-based model for text generation. Previous work, namely MILAN (Mutual Information-guided Linguistic Annotation of Neuron), has tried to visualize the neuron behaviour using modified Show, Attend, and Tell (SAT) model in the encoder, and LSTM added with Bahdanau attention in the decoder. MILAN can show great result on short sequence neuron captioning, but it does not show great result on long sequence neuron captioning, so in this work, we would like to improve the performance of MILAN even more by utilizing different kind of attention mechanism and additionally adding several attention result into one, in order to combine all the advantages from several attention mechanism. Using our compound dataset, we obtained higher BLEU and F1-Score on our proposed model, achieving 17.742 and 0.4811 respectively. At some point where the model converges at the peak, our model obtained BLEU of 21.2262 and BERTScore F1-Score of 0.4870.

arxiv情報

著者	Alfirsa Damasyifa Fauzulhaq,Wahyu Parwitayasa,Joseph Ananda Sugihdharma,M. Fadli Ridhani,Novanto Yudistira
発行日	2024-01-05 10:41:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー