AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

要約

評価は、情報検索 (IR) モデルの進歩において重要な役割を果たします。
しかし、事前定義されたドメインと人間がラベル付けしたデータに基づいている現在のベンチマークは、新興ドメインの評価ニーズに費用対効果と効率の両方で対処する上で限界に直面しています。
この課題に対処するために、私たちは Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench) を提案します。
AIR-Bench は、次の 3 つの主要な機能によって区別されます。 1) 自動化。
AIR-Bench のテストデータは、人間の介入なしに大規模言語モデル (LLM) によって自動的に生成されます。
2) 異種混合。
AIR-Bench のテストデータは、さまざまなタスク、ドメイン、言語に関して生成されます。
3) ダイナミック。
AIR-Bench がカバーするドメインと言語は、コミュニティ開発者にますます包括的な評価ベンチマークを提供するために継続的に拡張されています。
私たちは、現実世界のコーパスに基づいて多様で高品質な評価データセットを自動的に作成するため、信頼性が高く堅牢なデータ生成パイプラインを開発します。
私たちの調査結果は、AIR-Bench で生成されたテストデータが人間によるラベル付けされたテストデータとよく一致しており、AIR-Bench が IR モデルを評価するための信頼できるベンチマークであることを示しています。
AIR-Bench のリソースは、https://github.com/AIR-Bench/AIR-Bench で公開されています。

要約(オリジナル)

Evaluation plays a crucial role in the advancement of information retrieval (IR) models. However, current benchmarks, which are based on predefined domains and human-labeled data, face limitations in addressing evaluation needs for emerging domains both cost-effectively and efficiently. To address this challenge, we propose the Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench). AIR-Bench is distinguished by three key features: 1) Automated. The testing data in AIR-Bench is automatically generated by large language models (LLMs) without human intervention. 2) Heterogeneous. The testing data in AIR-Bench is generated with respect to diverse tasks, domains and languages. 3) Dynamic. The domains and languages covered by AIR-Bench are constantly augmented to provide an increasingly comprehensive evaluation benchmark for community developers. We develop a reliable and robust data generation pipeline to automatically create diverse and high-quality evaluation datasets based on real-world corpora. Our findings demonstrate that the generated testing data in AIR-Bench aligns well with human-labeled testing data, making AIR-Bench a dependable benchmark for evaluating IR models. The resources in AIR-Bench are publicly available at https://github.com/AIR-Bench/AIR-Bench.

arxiv情報

著者	Jianlyu Chen,Nan Wang,Chaofan Li,Bo Wang,Shitao Xiao,Han Xiao,Hao Liao,Defu Lian,Zheng Liu
発行日	2024-12-17 17:15:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー