Identification of Knowledge Neurons in Protein Language Models

要約

ニューラル言語モデルは、自然言語処理タスクにおいてエンティティの複雑な表現を学習するための強力なツールとなっています。
ただし、特にモデル予測の信頼性が重要である計算生物学などの分野では、その解釈可能性は依然として大きな課題です。
この研究では、重要な情報の理解を表現するコンポーネントである知識ニューロンを特定し特徴付けることにより、タンパク質言語モデル、特に最先端の ESM モデルの解釈可能性を高めることを目指しています。
酵素配列分類のタスク用に ESM モデルを微調整した後、元のモデルからニューロンのサブセットを保存する 2 つの知識ニューロン選択方法を比較します。
活性化ベースと統合勾配ベースの選択の 2 つの方法は、ランダムなベースラインよりも一貫して優れています。
特に、これらの方法は、自己注意モジュールの主要なベクトル予測ネットワークに高密度の知識ニューロンが存在することを示しています。
キーベクトルが入力配列のさまざまな特徴を理解することに特化していることを考えると、これらの知識ニューロンはさまざまな酵素配列モチーフの知識を捕捉できる可能性があります。
将来的には、各ニューロンが捕捉した知識の種類を特徴付けることができるかもしれません。

要約(オリジナル)

Neural language models have become powerful tools for learning complex representations of entities in natural language processing tasks. However, their interpretability remains a significant challenge, particularly in domains like computational biology where trust in model predictions is crucial. In this work, we aim to enhance the interpretability of protein language models, specifically the state-of-the-art ESM model, by identifying and characterizing knowledge neurons – components that express understanding of key information. After fine-tuning the ESM model for the task of enzyme sequence classification, we compare two knowledge neuron selection methods that preserve a subset of neurons from the original model. The two methods, activation-based and integrated gradient-based selection, consistently outperform a random baseline. In particular, these methods show that there is a high density of knowledge neurons in the key vector prediction networks of self-attention modules. Given that key vectors specialize in understanding different features of input sequences, these knowledge neurons could capture knowledge of different enzyme sequence motifs. In the future, the types of knowledge captured by each neuron could be characterized.

arxiv情報

著者	Divya Nori,Shivali Singireddy,Marina Ten Have
発行日	2023-12-17 17:23:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Identification of Knowledge Neurons in Protein Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー