Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings

要約

アトリビューションスコアはさまざまな入力部分の重要性を示すため、モデルの動作を説明できます。
現在、プロンプトベースのモデルは、低リソース設定でも適応しやすいため、人気が高まっています。
ただし、プロンプトベースのモデルから抽出されたアトリビューションスコアの品質はまだ調査されていません。
この研究では、プロンプトベースのモデルから抽出されたアトリビューションスコアを分析することでこのトピックに取り組みます。
妥当性と忠実性を評価し、それらを微調整されたモデルや大規模な言語モデルから抽出されたアトリビューションスコアと比較します。
以前の研究とは対照的に、分析に別の次元としてトレーニングサイズを導入しました。
プロンプトパラダイム (エンコーダーベースまたはデコーダーベースのモデルのいずれかを使用) を使用すると、低リソース設定でモデルを微調整するよりももっともらしい説明が得られ、Shapley 値サンプリングは、より多くの結果をもたらすという点で、アテンションと統合勾配を一貫して上回っていることがわかりました。
納得のいく、忠実な説明。

要約(オリジナル)

Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour. Currently, prompt-based models are gaining popularity, i.a., due to their easier adaptability in low-resource settings. However, the quality of attribution scores extracted from prompt-based models has not been investigated yet. In this work, we address this topic by analyzing attribution scores extracted from prompt-based models w.r.t. plausibility and faithfulness and comparing them with attribution scores extracted from fine-tuned models and large language models. In contrast to previous work, we introduce training size as another dimension into the analysis. We find that using the prompting paradigm (with either encoder-based or decoder-based models) yields more plausible explanations than fine-tuning the models in low-resource settings and Shapley Value Sampling consistently outperforms attention and Integrated Gradients in terms of leading to more plausible and faithful explanations.

arxiv情報

著者	Wei Zhou,Heike Adel,Hendrik Schuff,Ngoc Thang Vu
発行日	2024-03-08 14:14:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー