Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification

要約

深い学習における解釈可能性の必要性は、反事実的な説明への関心を促進し、モデルの予測を変更するインスタンスへの最小限の変更を特定します。
現在の反事実（CF）生成方法には、タスク固有の微調整が必要であり、低品質のテキストを作成します。
大規模な言語モデル（LLM）は、高品質のテキスト生成に効果的ですが、微調整せずにラベルフリッピングの反事実（つまり、予測を変える反事実）との闘いです。
LLMSによる反事実的生成をサポートするための2つの単純な分類器誘導アプローチを導入し、LLMSの強みを維持しながら微調整の必要性を排除します。
それらの単純さにもかかわらず、私たちの方法は最先端の反事実的生成方法を上回り、異なるLLMで効果的であり、分類器情報を使用してLLMSによる反事実的生成を導く利点を強調しています。
さらに、生成されたCFSによるデータ増強が分類器の堅牢性を改善できることを示しています。
私たちの分析は、LLMSによる反事実的生成における重要な問題を明らかにしています。LLMSは、分類器を忠実に追跡するのではなく、パラメトリック知識に依存しています。

要約(オリジナル)

The need for interpretability in deep learning has driven interest in counterfactual explanations, which identify minimal changes to an instance that change a model’s prediction. Current counterfactual (CF) generation methods require task-specific fine-tuning and produce low-quality text. Large Language Models (LLMs), though effective for high-quality text generation, struggle with label-flipping counterfactuals (i.e., counterfactuals that change the prediction) without fine-tuning. We introduce two simple classifier-guided approaches to support counterfactual generation by LLMs, eliminating the need for fine-tuning while preserving the strengths of LLMs. Despite their simplicity, our methods outperform state-of-the-art counterfactual generation methods and are effective across different LLMs, highlighting the benefits of guiding counterfactual generation by LLMs with classifier information. We further show that data augmentation by our generated CFs can improve a classifier’s robustness. Our analysis reveals a critical issue in counterfactual generation by LLMs: LLMs rely on parametric knowledge rather than faithfully following the classifier.

arxiv情報

著者	Van Bach Nguyen,Christin Seifert,Jörg Schlötterer
発行日	2025-03-06 14:15:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー