B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

要約

ブラックボックスモデルの事後説明方法現在の神経モデルの説明可能性がないため、しばしば忠実さと人間の解釈可能性と格闘しています。
一方、B-COSネットワークは、アーキテクチャおよび計算の適応を通じてモデルの説明可能性を改善するために導入されていますが、これまでのところ、それらのアプリケーションはコンピュータービジョンモデルと関連するトレーニングパイプラインに限定されています。
この作業では、B-COS LMS、つまりNLPタスクに力を与えられたB-COSネットワークを紹介します。
私たちのアプローチは、B-COS変換とタスクの微調整を組み合わせて、以前のB-COSメソッドと比較して効率を改善することにより、事前に訓練された言語モデルをB-COS LMSに直接変換します。
私たちの自動および人間の評価結果は、B-COS LMSが従来の微調整に匹敵するタスクのパフォーマンスを維持しながら、事後の方法よりも忠実で人間の解釈可能な説明を生成することを示しています。
詳細な分析では、B-COS LMSが学習プロセスと説明パターンで従来の微調整されたモデルとどのように異なるかを調査します。
最後に、調査結果に基づいてB-COS LMSを効果的に構築するための実用的なガイドラインを提供します。
私たちのコードは、https：//anonymous.4open.science/r/bcos_lmで入手できます。

要約(オリジナル)

Post-hoc explanation methods for black-box models often struggle with faithfulness and human interpretability due to the lack of explainability in current neural models. Meanwhile, B-cos networks have been introduced to improve model explainability through architectural and computational adaptations, but their application has so far been limited to computer vision models and their associated training pipelines. In this work, we introduce B-cos LMs, i.e., B-cos networks empowered for NLP tasks. Our approach directly transforms pre-trained language models into B-cos LMs by combining B-cos conversion and task fine-tuning, improving efficiency compared to previous B-cos methods. Our automatic and human evaluation results demonstrate that B-cos LMs produce more faithful and human interpretable explanations than post hoc methods, while maintaining task performance comparable to conventional fine-tuning. Our in-depth analysis explores how B-cos LMs differ from conventionally fine-tuned models in their learning processes and explanation patterns. Finally, we provide practical guidelines for effectively building B-cos LMs based on our findings. Our code is available at https://anonymous.4open.science/r/bcos_lm.

arxiv情報

著者	Yifan Wang,Sukrut Rao,Ji-Ung Lee,Mayank Jobanputra,Vera Demberg
発行日	2025-02-18 16:13:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー