UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

要約

抽出的根拠は、予測に最も影響を与えたテキスト入力を強調表示することにより、特定のタスクインスタンスに対する言語モデル (LM) の予測を説明します。
理想的には、理論的根拠の抽出は、LM (つまり、タスクモデル) のタスクパフォーマンスを損なうことなく、忠実 (LM の実際の動作を反映) かつもっともらしい (人間に納得させる) 必要があります。
アトリビューションアルゴリズムと select-predict パイプラインは理論的根拠の抽出によく使用されますが、どちらも特定のヒューリスティックに依存しているため、3 つの欲求すべてを満たすことができません。
これに照らして、根拠抽出器の最適化を次のように一般化する柔軟な学習フレームワークである UNIREX を提案します。(1) 学習された根拠抽出器のアーキテクチャを指定します。
(2) 説明可能性の目標 (つまり、忠実度と妥当性の基準) を選択します。
(3) 選択した目的を使用して、タスクモデルと論理的根拠抽出器を共同でトレーニングします。
UNIREX を使用すると、(1) で以前の作業のヒューリスティックな設計選択を一般的な学習された論理的根拠抽出器に置き換え、(2) ～ (3) で 3 つの必要性すべてに対して最適化できます。
複数の必要性に関するメソッド間の比較を容易にするために、Normalized Relative Gain (NRG) メトリックを導入します。
5 つのテキスト分類データセット全体で、最高の UNIREX 構成はベースラインよりも平均 32.9% NRG 優れています。
さらに、UNIREX で訓練された理論的根拠抽出器は、目に見えないデータセットやタスクにも一般化できることがわかりました。

要約(オリジナル)

An extractive rationale explains a language model’s (LM’s) prediction on a given task instance by highlighting the text inputs that most influenced the prediction. Ideally, rationale extraction should be faithful (reflective of LM’s actual behavior) and plausible (convincing to humans), without compromising the LM’s (i.e., task model’s) task performance. Although attribution algorithms and select-predict pipelines are commonly used in rationale extraction, they both rely on certain heuristics that hinder them from satisfying all three desiderata. In light of this, we propose UNIREX, a flexible learning framework that generalizes rationale extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (2) select explainability objectives (i.e., faithfulness and plausibility criteria); and (3) jointly the train task model and rationale extractor on the task using the selected objectives. UNIREX enables replacing prior works’ heuristic design choices with a generic learned rationale extractor in (1) and optimizing it for all three desiderata in (2)-(3). To facilitate comparison between methods with respect to multiple desiderata, we introduce the Normalized Relative Gain (NRG) metric. Across five text classification datasets, our best UNIREX configuration outperforms baselines by an average of 32.9% NRG. Plus, we find that UNIREX-trained rationale extractors can even generalize to unseen datasets and tasks.

arxiv情報

著者	Aaron Chan,Maziar Sanjabi,Lambert Mathias,Liang Tan,Shaoliang Nie,Xiaochang Peng,Xiang Ren,Hamed Firooz
発行日	2023-02-27 03:46:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー