CELL your Model: Contrastive Explanations for Large Language Models

要約

ブラックボックスディープニューラルネットワーク分類モデルの出現により、彼らの決定を説明する必要性が生じました。
ただし、大規模な言語モデル（LLMS）などの生成AIの場合、説明するクラスの予測はありません。
むしろ、LLMが特定のプロンプトに特定の応答を出力する理由を尋ねることができます。
この論文では、単にブラックボックス/クエリアクセスを必要とする対照的な説明方法を提案することにより、この質問に答えます。
私たちの説明は、LLMが特定のプロンプトへの返信を出力することを示唆しています。なぜなら、プロンプトがわずかに変更された場合、LLMはより望ましくない、または元の応答と矛盾する別の応答を与えていたからです。
重要な洞察は、対照的な説明には、ユーザーにとって意味を持つスコアリング関数が必要であり、必ずしも特定の実際の価値のある量ではないということです（つまり、クラスラベル）。
この目的のために、私たちは主なアルゴリズムの貢献である新しい予算のアルゴリズムを提供します。これは、より長いコンテキストに必要なクエリ予算を順守しながら、そのようなスコアリング機能に基づいてコントラストをインテリジェントに作成します。
オープンテキストの生成やチャットボットの会話などの重要な自然言語タスクでの方法の有効性を示します。

要約(オリジナル)

The advent of black-box deep neural network classification models has sparked the need to explain their decisions. However, in the case of generative AI, such as large language models (LLMs), there is no class prediction to explain. Rather, one can ask why an LLM output a particular response to a given prompt. In this paper, we answer this question by proposing a contrastive explanation method requiring simply black-box/query access. Our explanations suggest that an LLM outputs a reply to a given prompt because if the prompt was slightly modified, the LLM would have given a different response that is either less preferable or contradicts the original response. The key insight is that contrastive explanations simply require a scoring function that has meaning to the user and not necessarily a specific real valued quantity (viz. class label). To this end, we offer a novel budgeted algorithm, our main algorithmic contribution, which intelligently creates contrasts based on such a scoring function while adhering to a query budget, necessary for longer contexts. We show the efficacy of our method on important natural language tasks such as open-text generation and chatbot conversations.

arxiv情報

著者	Ronny Luss,Erik Miehling,Amit Dhurandhar
発行日	2025-02-17 18:37:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CELL your Model: Contrastive Explanations for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー