On Measuring Faithfulness of Natural Language Explanations

要約

大規模言語モデル (LLM) は、事後または思考連鎖 (CoT) の説明を通じて、独自の予測を説明できます。
ただし、LLM は、その根本的な推論に忠実ではない、合理的に聞こえる説明をでっち上げる可能性があります。
最近の研究では、事後説明または CoT 説明の忠実性を判断することを目的としたテストが設計されました。
この論文では、既存の忠実度テストは実際にはモデルの内部動作の観点から忠実度を測定しているのではなく、出力レベルでの自己一貫性を評価しているだけであると主張します。
私たちの仕事の目的は 2 つあります。
i) 我々は、モデルの説明可能性の観点から既存の忠実性テストのステータスを明確にし、それらを自己一貫性テストとして特徴付けることを目的としています。
この評価は、自己整合性テスト用の比較整合性バンクを構築することで強調されます。このバンクでは、11 のオープンソース LLM と 5 つのデータセットの共通スイートで既存のテストを初めて比較します (ii) 私たちが提案する自己整合性の尺度 CC-
シャープ。
CC-SHAP は、LLM 自己一貫性の新しいきめの細かい測定 (テストではありません) であり、モデルの入力寄与を答えの予測と生成された説明と比較します。
CC-SHAP では、より解釈可能で粒度の細かい方法で忠実度を測定することにさらに一歩前進することを目指しています。
コードは \url{https://github.com/Heidelberg-NLP/CC-SHAP} で入手できます。

要約(オリジナル)

Large language models (LLMs) can explain their own predictions, through post-hoc or Chain-of-Thought (CoT) explanations. However the LLM could make up reasonably sounding explanations that are unfaithful to its underlying reasoning. Recent work has designed tests that aim to judge the faithfulness of either post-hoc or CoT explanations. In this paper we argue that existing faithfulness tests are not actually measuring faithfulness in terms of the models’ inner workings, but only evaluate their self-consistency on the output level. The aims of our work are two-fold. i) We aim to clarify the status of existing faithfulness tests in terms of model explainability, characterising them as self-consistency tests instead. This assessment we underline by constructing a Comparative Consistency Bank for self-consistency tests that for the first time compares existing tests on a common suite of 11 open-source LLMs and 5 datasets — including ii) our own proposed self-consistency measure CC-SHAP. CC-SHAP is a new fine-grained measure (not test) of LLM self-consistency that compares a model’s input contributions to answer prediction and generated explanation. With CC-SHAP, we aim to take a step further towards measuring faithfulness with a more interpretable and fine-grained method. Code available at \url{https://github.com/Heidelberg-NLP/CC-SHAP}

arxiv情報

著者	Letitia Parcalabescu,Anette Frank
発行日	2023-11-13 16:53:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Measuring Faithfulness of Natural Language Explanations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー