Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

要約

このレポートは、TREC 2024 検索拡張生成 (RAG) トラックからの部分的な結果の最初の概要を提供します。
私たちは、RAG 評価が情報アクセス (より広範には自然言語処理と人工知能) の継続的な進歩に対する障壁であると認識しており、この分野における多くの課題への取り組みに貢献できることを願っています。
この研究で私たちが検討する中心的な仮説は、もともと 2003 年に TREC 質問応答トラック用に開発されたナゲット評価方法論が、RAG システムを評価するための強固な基盤を提供するということです。
そのため、私たちの取り組みはこの方法論の「リファクタリング」に焦点を当てており、特に大規模な言語モデルを適用して、ナゲットを自動的に作成し、ナゲットをシステム応答に自動的に割り当てることに重点を置いています。
これを AutoNuggetizer フレームワークと呼びます。
TREC セットアップ内では、人間の評価者によって半手動でナゲットが作成され、システムの回答に手動で割り当てられる手動プロセスに対して完全自動プロセスを調整できます。
45 回の実行による 21 のトピックにわたる初期結果に基づいて、完全に自動化されたナゲット評価と人間の評価者による (ほとんどの) 手動のナゲット評価から導出されたスコアの間に強い相関関係があることが観察されました。
これは、完全に自動化された評価プロセスを、RAG システムの将来の反復の指針として使用できることを示唆しています。

要約(オリジナル)

This report provides an initial look at partial results from the TREC 2024 Retrieval-Augmented Generation (RAG) Track. We have identified RAG evaluation as a barrier to continued progress in information access (and more broadly, natural language processing and artificial intelligence), and it is our hope that we can contribute to tackling the many challenges in this space. The central hypothesis we explore in this work is that the nugget evaluation methodology, originally developed for the TREC Question Answering Track in 2003, provides a solid foundation for evaluating RAG systems. As such, our efforts have focused on ‘refactoring’ this methodology, specifically applying large language models to both automatically create nuggets and to automatically assign nuggets to system answers. We call this the AutoNuggetizer framework. Within the TREC setup, we are able to calibrate our fully automatic process against a manual process whereby nuggets are created by human assessors semi-manually and then assigned manually to system answers. Based on initial results across 21 topics from 45 runs, we observe a strong correlation between scores derived from a fully automatic nugget evaluation and a (mostly) manual nugget evaluation by human assessors. This suggests that our fully automatic evaluation process can be used to guide future iterations of RAG systems.

arxiv情報

著者	Ronak Pradeep,Nandan Thakur,Shivani Upadhyay,Daniel Campos,Nick Craswell,Jimmy Lin
発行日	2024-11-14 17:25:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー