AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web


このペーパーでは、50 の異なる組織によるファクトチェックをカバーする 4,568 件の実世界の主張の新しいデータセットである AVeriTeC を紹介します。
複数ラウンドのアノテーション プロセスを通じて、コンテキスト依存、証拠の不十分さ、時間的漏洩などのよくある落とし穴を回避し、評決に関して $\kappa=0.619$ というアノテーター間の実質的な合意に達します。


Existing datasets for automated fact-checking have substantial limitations, such as relying on artificial claims, lacking annotations for evidence and intermediate reasoning, or including evidence published after the claim. In this paper we introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict. Through a multi-round annotation process, we avoid common pitfalls including context dependence, evidence insufficiency, and temporal leakage, and reach a substantial inter-annotator agreement of $\kappa=0.619$ on verdicts. We develop a baseline as well as an evaluation scheme for verifying claims through several question-answering steps against the open web.


著者 Michael Schlichtkrull,Zhijiang Guo,Andreas Vlachos
発行日 2023-11-08 11:53:55+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL パーマリンク