CRAG — Comprehensive RAG Benchmark

要約

検索補強型生成（RAG）は最近、大規模言語モデル（LLM）の欠点である知識不足を緩和する有望なソリューションとして登場した。しかしながら、既存のRAGデータセットは、実世界の質問応答（QA）タスクの多様で動的な性質を適切に表現していない。このギャップを埋めるために、我々は、4,409の質問と答えのペアと、ウェブとナレッジグラフ（KG）検索をシミュレートするための模擬APIからなる、事実に基づいた質問応答ベンチマークである包括的なRAGベンチマーク（CRAG）を紹介する。CRAGは、5つのドメインと8つの質問カテゴリにまたがる多様な質問をカプセル化するように設計されており、ポピュラーなものからロングテールまで様々なエンティティの人気と、数年から数秒までの時間的ダイナミズムを反映しています。このベンチマークの評価は、完全に信頼できるQAへのギャップを浮き彫りにしている。ほとんどの先進的なLLMがCRAGで<=34%の精度を達成するのに対し、RAGを簡単な方法で追加することで、精度は44%までしか向上しない。業界の最先端のRAGソリューションは、幻覚のない質問に63%しか答えていない。CRAGはまた、より高いダイナミズム、より低い人気、またはより高い複雑性を持つ事実に関する質問への回答精度がはるかに低いことを明らかにし、今後の研究の方向性を示唆している。CRAGベンチマークは、KDD Cup 2024チャレンジの基礎を築き、何千人もの参加者と投稿を集めました。我々は、RAGソリューションと一般的なQAソリューションを推進する研究コミュニティに貢献するため、CRAGを維持することを約束する。CRAGはhttps://github.com/facebookresearch/CRAG/。

要約(オリジナル)

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)’s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation of this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% of questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge and attracted thousands of participants and submissions. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. CRAG is available at https://github.com/facebookresearch/CRAG/.

arxiv情報

著者	Xiao Yang,Kai Sun,Hao Xin,Yushi Sun,Nikita Bhalla,Xiangsen Chen,Sajal Choudhary,Rongze Daniel Gui,Ziran Will Jiang,Ziyu Jiang,Lingkun Kong,Brian Moran,Jiaqi Wang,Yifan Ethan Xu,An Yan,Chenyu Yang,Eting Yuan,Hanwen Zha,Nan Tang,Lei Chen,Nicolas Scheffer,Yue Liu,Nirav Shah,Rakesh Wanga,Anuj Kumar,Wen-tau Yih,Xin Luna Dong
発行日	2024-11-01 05:30:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

CRAG — Comprehensive RAG Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー