SPEX: Scaling Feature Interaction Explanations for LLMs

要約

大規模な言語モデル（LLM）は、入力機能間の複雑な相互作用をキャプチャする能力により、機械学習に革命をもたらしました。
Shapのような一般的な事後説明方法は、限界機能の帰属を提供しますが、相互作用の重要性への拡張は、入力長（$ \約20 $）にのみスケーリングします。
Spectral Excouler（SPEX）を提案します。これは、大きな入力長（$ \約1000）$に効率的にスケーリングするモデルと依存の相互作用属性アルゴリズムを提案します。
SPEXは、相互作用の間の自然なスパースの根底にある（実際のデータで一般的なもの）を悪用し、チャネルデコードアルゴリズムを使用してスパースフーリエ変換を適用して、重要な相互作用を効率的に識別します。
LLMSが入力間の相互作用を利用してタスクを完了する必要がある3つの困難なロングコンテキストデータセットで実験を実行します。
大規模な入力の場合、SPEXは、LLM出力を忠実に再構築するという点で、限界属性法を最大20％上回ることができます。
さらに、SPEXは、モデルの出力に強く影響する主要な機能と相互作用を正常に識別します。
データセットの1つであるHotpotqaの場合、SPEXは人間の注釈と整合する相互作用を提供します。
最後に、モデルと存在するアプローチを使用して説明を生成して、クローズドソースLLMS（GPT-4O MINI）の抽象的な推論とビジョン言語モデルの構成推論を実証します。

要約(オリジナル)

Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths ($\approx 20$). We propose Spectral Explainer (SPEX), a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths ($\approx 1000)$. SPEX exploits underlying natural sparsity among interactions — common in real-world data — and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions. We perform experiments across three difficult long-context datasets that require LLMs to utilize interactions between inputs to complete the task. For large inputs, SPEX outperforms marginal attribution methods by up to 20% in terms of faithfully reconstructing LLM outputs. Further, SPEX successfully identifies key features and interactions that strongly influence model output. For one of our datasets, HotpotQA, SPEX provides interactions that align with human annotations. Finally, we use our model-agnostic approach to generate explanations to demonstrate abstract reasoning in closed-source LLMs (GPT-4o mini) and compositional reasoning in vision-language models.

arxiv情報

著者	Justin Singh Kang,Landon Butler,Abhineet Agarwal,Yigit Efe Erginbas,Ramtin Pedarsani,Kannan Ramchandran,Bin Yu
発行日	2025-02-19 16:49:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SPEX: Scaling Feature Interaction Explanations for LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー