Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana

要約

特にドメイン固有のコンテキストでの検索拡張生成 (RAG) システムを評価するには、適用シナリオの特有の要件に対処するベンチマークが必要です。
実際のデータは入手が難しいため、LLM ベースの方法を使用して合成データを生成するのが一般的な戦略です。
既存のソリューションは汎用的なもので、ドキュメントが与えられると、Q&A ペアを構築するための質問が生成されます。
ただし、生成された質問は個別に優れている場合もありますが、通常は、実際のエンドユーザーが RAG システムと対話できるさまざまな方法を合理的にカバーできるほど多様ではありません。
ここでは、RAG アプリケーションに合わせてカスタマイズ可能で多様な合成 Q&A ベンチマークを生成するツールである DataMorgana を紹介します。
DataMorgana を使用すると、ユーザーおよび質問のカテゴリの詳細な構成が可能になり、ベンチマーク内でのそれらの分布を制御できます。
軽量の 2 段階プロセスを使用し、効率と高速な反復を保証しながら、予想されるトラフィックを反映するベンチマークを生成します。
私たちは一連の徹底的な実験を実施し、DataMorgana がドメイン固有のコーパスと一般知識のコーパスにわたって語彙的、構文的、意味的に多様な質問セットを生成する点で既存のツールやアプローチを上回っていることを定量的および定性的に示しています。
DataMorgana は、2025 年 2 月初旬に発表される予定の SIGIR’2025 LiveRAG チャレンジに関連して、研究コミュニティ内の選ばれたチームが最初のベータテスターとして利用できるようになります。

要約(オリジナル)

Evaluating Retrieval-Augmented Generation (RAG) systems, especially in domain-specific contexts, requires benchmarks that address the distinctive requirements of the applicative scenario. Since real data can be hard to obtain, a common strategy is to use LLM-based methods to generate synthetic data. Existing solutions are general purpose: given a document, they generate a question to build a Q&A pair. However, although the generated questions can be individually good, they are typically not diverse enough to reasonably cover the different ways real end-users can interact with the RAG system. We introduce here DataMorgana, a tool for generating highly customizable and diverse synthetic Q&A benchmarks tailored to RAG applications. DataMorgana enables detailed configurations of user and question categories and provides control over their distribution within the benchmark. It uses a lightweight two-stage process, ensuring efficiency and fast iterations, while generating benchmarks that reflect the expected traffic. We conduct a thorough line of experiments, showing quantitatively and qualitatively that DataMorgana surpasses existing tools and approaches in producing lexically, syntactically, and semantically diverse question sets across domain-specific and general-knowledge corpora. DataMorgana will be made available to selected teams in the research community, as first beta testers, in the context of the upcoming SIGIR’2025 LiveRAG challenge to be announced in early February 2025.

arxiv情報

著者	Simone Filice,Guy Horowitz,David Carmel,Zohar Karnin,Liane Lewin-Eytan,Yoelle Maarek
発行日	2025-01-22 10:47:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー