ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

要約

最近、事前トレーニング済みの言語モデルによって、抽象的テキスト要約のパフォーマンスが大幅に向上しました。
既存の抽象的要約方法の主な関心事は、生成された要約の事実の不一致の問題です。
この問題を軽減するために、自然言語推論や質問応答などに基づく効果的な事実評価指標の開発に多くの努力が注がれてきました。
ただし、計算の複雑さが高く、注釈付きデータに依存しているという制限があります。
ごく最近では、ChatGPT などの大規模な言語モデルが、自然言語の理解だけでなく、自然言語の推論においても強力な能力を示しています。
この論文では、バイナリ自然言語推論（NLI）、要約ランキング、および一貫性評価を含む粗粒度および細粒度の事実評価タスクで評価することにより、ゼロショット設定でのChatGPTの事実不一致評価能力を研究します。
実験結果は、ChatGPT が 3 つのタスクにわたる 6/9 データセットで以前の SOTA 評価メトリックよりも優れていることを示しており、ゼロショット設定で事実の矛盾を評価する大きな可能性を示しています。
結果はまた、迅速な設計の重要性と、ChatGPT の評価バイアス、間違った推論、および幻覚に関する制限に対処するための将来の取り組みの必要性を強調しています。

要約(オリジナル)

The performance of abstractive text summarization has been greatly boosted by pre-trained language models recently. The main concern of existing abstractive summarization methods is the factual inconsistency problem of their generated summary. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference and question answering et al. However, they have limitations of high computational complexity and relying on annotated data. Most recently, large language models such as ChatGPT have shown strong ability in not only natural language understanding but also natural language inference. In this paper, we study the factual inconsistency evaluation ability of ChatGPT under the zero-shot setting by evaluating it on the coarse-grained and fine-grained factuality evaluation tasks including binary natural language inference (NLI), summary ranking, and consistency rating. Experimental results show that ChatGPT outperforms previous SOTA evaluation metrics on 6/9 datasets across three tasks, demonstrating its great potential for assessing factual inconsistency in the zero-shot setting. The results also highlight the importance of prompt design and the need for future efforts to address ChatGPT’s limitations on evaluation bias, wrong reasoning, and hallucination.

arxiv情報

著者	Zheheng Luo,Qianqian Xie,Sophia Ananiadou
発行日	2023-03-27 22:30:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー