Explaining black box text modules in natural language with language models

要約

大規模言語モデル (LLM) は、増え続けるタスクに対して顕著な予測パフォーマンスを実証しています。
しかし、その急速な普及と不透明性の増大により、解釈可能性に対するニーズが高まっています。
ここでは、ブラックボックステキストモジュールの自然言語説明を自動的に取得できるかどうかを尋ねます。
「テキストモジュール」とは、LLM 内のサブモジュールや脳領域の適合モデルなど、テキストをスカラー連続値にマッピングする関数です。
「ブラックボックス」は、モジュールの入力/出力にのみアクセスできることを示します。
Summarize and Score (SASC) を導入します。これは、テキストモジュールを取り込み、モジュールの選択性に関する自然言語による説明と、その説明の信頼性のスコアを返すメソッドです。
私たちはSASCを3つの文脈で研究します。
まず、合成モジュールで SASC を評価し、多くの場合、グラウンドトゥルースの説明が復元されることがわかりました。
次に、SASC を使用して、事前トレーニングされた BERT モデル内にあるモジュールを説明し、モデルの内部の検査を可能にします。
最後に、SASC が言語刺激に対する個々の fMRI ボクセルの反応の説明を生成できることを示し、きめの細かい脳マッピングへの応用の可能性を示します。
SASC を使用して結果を再現するためのすべてのコードは、Github で入手できます。

要約(オリジナル)

Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A ‘text module’ is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. ‘Black box’ indicates that we only have access to the module’s inputs/outputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module’s selectivity along with a score for how reliable the explanation is. We study SASC in 3 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model’s internals. Finally, we show that SASC can generate explanations for the response of individual fMRI voxels to language stimuli, with potential applications to fine-grained brain mapping. All code for using SASC and reproducing results is made available on Github.

arxiv情報

著者	Chandan Singh,Aliyah R. Hsu,Richard Antonello,Shailee Jain,Alexander G. Huth,Bin Yu,Jianfeng Gao
発行日	2023-11-15 17:19:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Explaining black box text modules in natural language with language models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー