Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data

要約

ニューラルモデルの精度を維持しつつ、モデルの決定を訓練データに忠実に説明することは可能だろうか？我々は「ラッパーボックス（wrapper box）」パイプラインを提案する：通常通りニューラルモデルを学習し、その学習した特徴表現を古典的で解釈可能なモデルに用いて予測を行う。4つの大規模言語モデル（LLM）、異なるスケールの2つのデータセット、3つの古典モデル、4つの評価指標を含む、様々なサイズの7つの言語モデルにおいて、我々はまず、ラッパー古典モデルの予測性能が、オリジナルのニューラルモデルとほぼ同等であることを示す。古典的モデルは透過的であるため、各モデルの決定は、ユーザーに直接示すことができる既知の学習例セットによって決定される。したがって、我々のパイプラインは、古典的モデルの決定を忠実に訓練データに帰属させながら、ニューラル言語モデルの予測性能を保持する。他のユースケースの中でも、このような帰属は、責任ある訓練事例に基づいてモデルの決定を争うことを可能にする。先行研究と比較して、我々のアプローチは、モデル決定を変更するためにどの訓練データを削除すべきかを特定する際に、より高いカバレッジと正しさを達成する。発見を再現するために、我々のソースコードはオンライン: https://github.com/SamSoup/WrapperBox.

要約(オリジナル)

Can we preserve the accuracy of neural models while also providing faithful explanations of model decisions to training data? We propose a ‘wrapper box” pipeline: training a neural model as usual and then using its learned feature representation in classic, interpretable models to perform prediction. Across seven language models of varying sizes, including four large language models (LLMs), two datasets at different scales, three classic models, and four evaluation metrics, we first show that the predictive performance of wrapper classic models is largely comparable to the original neural models. Because classic models are transparent, each model decision is determined by a known set of training examples that can be directly shown to users. Our pipeline thus preserves the predictive performance of neural language models while faithfully attributing classic model decisions to training data. Among other use cases, such attribution enables model decisions to be contested based on responsible training instances. Compared to prior work, our approach achieves higher coverage and correctness in identifying which training data to remove to change a model decision. To reproduce findings, our source code is online at: https://github.com/SamSoup/WrapperBox.

arxiv情報

著者	Yiheng Su,Junyi Jessy Li,Matthew Lease
発行日	2024-10-04 17:23:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー