KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating Inconsistencies in Natural Language Explanations

要約

近年、モデルの予測を正当化するために生成される自然言語説明（NLE）の品質が大幅に向上している一方で、生成されたNLEの矛盾を検出し緩和する研究は非常に限られています。本研究では、外部の知識ベースを活用し、矛盾したNLEを検出するための既存の敵対的攻撃を大幅に改善する。この攻撃を高性能なNLEモデルに適用し、NLEの品質が高いモデルが必ずしも矛盾を少なく生成するわけではないことを示す。さらに、モデルを外部の背景知識に基づかせることで、矛盾を緩和する既成の緩和方法を提案する。我々の方法は、我々の攻撃によって検出された以前の高性能なNLEモデルの不整合を減少させる。

要約(オリジナル)

While recent works have been considerably improving the quality of the natural language explanations (NLEs) generated by a model to justify its predictions, there is very limited research in detecting and alleviating inconsistencies among generated NLEs. In this work, we leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs. We apply our attack to high-performing NLE models and show that models with higher NLE quality do not necessarily generate fewer inconsistencies. Moreover, we propose an off-the-shelf mitigation method to alleviate inconsistencies by grounding the model into external background knowledge. Our method decreases the inconsistencies of previous high-performing NLE models as detected by our attack.

arxiv情報

著者	Myeongjun Jang,Bodhisattwa Prasad Majumder,Julian McAuley,Thomas Lukasiewicz,Oana-Maria Camburu
発行日	2023-06-05 15:51:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating Inconsistencies in Natural Language Explanations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー