ChatGPT as a Factual Inconsistency Evaluator for Text Summarization

要約

タイトル：ChatGPTを用いた文章要約の事実の不整合性評価

要約：
– テキスト要約の性能は、事前学習された言語モデルによって大幅に向上している。
– しかし、既存の多くの方法では、生成された要約の多くが原文書と事実的に矛盾していることが問題となっている。
– そこで、自然言語推論、質問応答、構文依存などに基づく効果的な事実性評価尺度の開発に多くの努力が注がれている。
– しかし、これらのアプローチは、高い計算複雑性またはマルチコンポーネントパイプラインによって導入される不確実性によって制限されており、人間の判断と部分的な一致しか得られない。
– 最近、大規模言語モデル(LLMs)がテキスト生成だけでなく言語理解でも優れた性能を示しているため、本研究ではChatGPTのゼロショット設定での事実的な不整合性評価能力を探究する。
– このため、2つのグループに分かれた3つのタスク(2値論拠推論、要約ランキング、一貫性評価)でChatGPTを評価し、結果、ChatGPTが前の評価尺度よりも優れていることが示された。
– しかし、ChatGPTの欠点として、より語彙的に類似した候補の傾向、誤った推論、指示の不適切な理解などがあることが発見された。

要約(オリジナル)

The performance of text summarization has been greatly boosted by pre-trained language models. A main concern of existing methods is that most generated summaries are not factually inconsistent with their source documents. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference, question answering, and syntactic dependency et al. However, these approaches are limited by either their high computational complexity or the uncertainty introduced by multi-component pipelines, resulting in only partial agreement with human judgement. Most recently, large language models(LLMs) have shown excellent performance in not only text generation but also language comprehension. In this paper, we particularly explore ChatGPT’s ability to evaluate factual inconsistency under a zero-shot setting by examining it on both coarse-grained and fine-grained evaluation tasks including binary entailment inference, summary ranking, and consistency rating. Experimental results indicate that ChatGPT generally outperforms previous evaluation metrics across the three tasks, indicating its great potential for factual inconsistency evaluation. However, a closer inspection of ChatGPT’s output reveals certain limitations including its preference for more lexically similar candidates, false reasoning, and inadequate understanding of instructions.

arxiv情報

著者	Zheheng Luo,Qianqian Xie,Sophia Ananiadou
発行日	2023-04-13 10:59:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

ChatGPT as a Factual Inconsistency Evaluator for Text Summarization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー