A Claim Decomposition Benchmark for Long-form Answer Verification

要約

LLM の進歩により、複雑な長形式の質問応答タスクのパフォーマンスが大幅に向上しました。
ただし、LLM の顕著な問題の 1 つは、事実ではない「幻覚」応答が生成されることです。
その結果、回答における各主張の帰属は、事実性と検証可能性を向上させるための一般的な解決策になります。
既存の研究は主に、回答の正確な引用を提供する方法に焦点を当てており、各回答の主張や陳述を特定する重要性がほとんど見落とされています。
このギャップを埋めるために、新しいクレーム分解ベンチマークを導入します。これには、LLM 応答のアトミックでチェック可能なクレームを識別できるシステムを構築する必要があります。
具体的には、高いデータ品質を確保するために追加の専門家の注釈を備えた WebCPM データセットに基づいて構築された Chinese Atomic Claim Decomposition Dataset (CACDD) を紹介します。
CACDD には、人間が注釈を付けた 500 の質問と回答のペアのコレクションが含まれており、合計 4956 のアトミッククレームが含まれています。
さらに、ヒューマンアノテーション用の新しいパイプラインを提案し、このタスクの課題について説明します。
さらに、ゼロショット、少数ショット、および微調整された LLM に関する実験結果をベースラインとして提供します。
結果は、クレームの分解が非常に困難であり、さらなる調査が必要であることを示しています。
すべてのコードとデータは \url{https://github.com/FBzzh/CACDD} で公開されています。

要約(オリジナル)

The advancement of LLMs has significantly boosted the performance of complex long-form question answering tasks. However, one prominent issue of LLMs is the generated ‘hallucination’ responses that are not factual. Consequently, attribution for each claim in responses becomes a common solution to improve the factuality and verifiability. Existing researches mainly focus on how to provide accurate citations for the response, which largely overlook the importance of identifying the claims or statements for each response. To bridge this gap, we introduce a new claim decomposition benchmark, which requires building system that can identify atomic and checkworthy claims for LLM responses. Specifically, we present the Chinese Atomic Claim Decomposition Dataset (CACDD), which builds on the WebCPM dataset with additional expert annotations to ensure high data quality. The CACDD encompasses a collection of 500 human-annotated question-answer pairs, including a total of 4956 atomic claims. We further propose a new pipeline for human annotation and describe the challenges of this task. In addition, we provide experiment results on zero-shot, few-shot and fine-tuned LLMs as baselines. The results show that the claim decomposition is highly challenging and requires further explorations. All code and data are publicly available at \url{https://github.com/FBzzh/CACDD}.

arxiv情報

著者	Zhihao Zhang,Yixing Fan,Ruqing Zhang,Jiafeng Guo
発行日	2024-10-16 13:34:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Claim Decomposition Benchmark for Long-form Answer Verification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー