LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

要約

ロングコンテクスト質問応答（LCQA）は、質問に対する正確な回答を得るために、ロングコンテクスト文書を推論することを目的とした、困難なタスクである。LCQAのための既存のロングコンテクスト大規模言語モデル（LLM）は、しばしば「途中で失われる」問題に苦戦している。検索補強型生成（RAG）は、外部の事実証拠を提供することでこの問題を軽減する。しかし、そのチャンキング戦略はグローバルなロングコンテクスト情報を破壊し、ロングコンテクストにおけるその低品質な検索は、実質的なノイズのためにLLMが効果的な事実の詳細を特定することを妨げる。このため、我々は、複雑なロングコンテクスト知識（すなわち、グローバルな情報と事実の詳細）に対するRAGの理解を強化するために、LCQAのための一般的で、二重視点的で、ロバストなLLMベースのRAGシステムパラダイムであるLongRAGを提案する。我々はLongRAGをプラグアンドプレイパラダイムとして設計し、様々なドメインやLLMへの適応を容易にする。3つのマルチホップデータセットを用いた広範な実験により、LongRAGはロングコンテクストLLM（6.94%増）、アドバンストRAG（6.16%増）、バニラRAG（17.25%増）を大幅に上回ることが実証された。さらに、定量的なアブレーション研究と多次元解析を行い、システムの構成要素と微調整戦略の有効性を強調している。データとコードはhttps://github.com/QingFei1/LongRAG。

要約(オリジナル)

Long-Context Question Answering (LCQA), a challenging task, aims to reason over long-context documents to yield accurate answers to questions. Existing long-context Large Language Models (LLMs) for LCQA often struggle with the ‘lost in the middle’ issue. Retrieval-Augmented Generation (RAG) mitigates this issue by providing external factual evidence. However, its chunking strategy disrupts the global long-context information, and its low-quality retrieval in long contexts hinders LLMs from identifying effective factual details due to substantial noise. To this end, we propose LongRAG, a general, dual-perspective, and robust LLM-based RAG system paradigm for LCQA to enhance RAG’s understanding of complex long-context knowledge (i.e., global information and factual details). We design LongRAG as a plug-and-play paradigm, facilitating adaptation to various domains and LLMs. Extensive experiments on three multi-hop datasets demonstrate that LongRAG significantly outperforms long-context LLMs (up by 6.94%), advanced RAG (up by 6.16%), and Vanilla RAG (up by 17.25%). Furthermore, we conduct quantitative ablation studies and multi-dimensional analyses, highlighting the effectiveness of the system’s components and fine-tuning strategies. Data and code are available at https://github.com/QingFei1/LongRAG.

arxiv情報

著者	Qingfei Zhao,Ruobing Wang,Yukuo Cen,Daren Zha,Shicheng Tan,Yuxiao Dong,Jie Tang
発行日	2024-11-01 15:36:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー