Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

要約

大規模言語モデル (LLM) は、検索エンジンを含むさまざまな言語関連タスクにわたって、驚くべきゼロショット一般化を実証しました。
ただし、既存の研究では、直接的なパッセージのランキングではなく、情報検索 (IR) のために LLM の生成能力を利用しています。
LLM のトレーニング前の目標とランキングの目標との間に矛盾があるため、別の課題が生じています。
この論文では、まず、IR における関連性ランキングのために ChatGPT や GPT-4 などの生成 LLM を調査します。
驚くべきことに、私たちの実験では、適切に指導された LLM が、一般的な IR ベンチマークにおいて、最先端の教師付き手法よりも優れた結果を提供できることが明らかになりました。
さらに、LLM のデータ汚染に関する懸念に対処するために、最新の知識に基づいて、未知の知識をランク付けするモデルの能力を検証することを目的として、NovelEval と呼ばれる新しいテストセットを収集します。
最後に、現実世界のアプリケーションの効率を向上させるために、置換蒸留スキームを使用して、ChatGPT のランキング機能を小さな特殊なモデルに蒸留する可能性を掘り下げます。
私たちの評価結果では、蒸留 440M モデルが BEIR ベンチマークで 3B 教師付きモデルよりも優れていることがわかりました。
結果を再現するコードは、www.github.com/sunnweiwei/RankGPT で入手できます。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks, including search engines. However, existing work utilizes the generative ability of LLMs for Information Retrieval (IR) rather than direct passage ranking. The discrepancy between the pre-training objectives of LLMs and the ranking objective poses another challenge. In this paper, we first investigate generative LLMs such as ChatGPT and GPT-4 for relevance ranking in IR. Surprisingly, our experiments reveal that properly instructed LLMs can deliver competitive, even superior results to state-of-the-art supervised methods on popular IR benchmarks. Furthermore, to address concerns about data contamination of LLMs, we collect a new test set called NovelEval, based on the latest knowledge and aiming to verify the model’s ability to rank unknown knowledge. Finally, to improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models using a permutation distillation scheme. Our evaluation results turn out that a distilled 440M model outperforms a 3B supervised model on the BEIR benchmark. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.

arxiv情報

著者	Weiwei Sun,Lingyong Yan,Xinyu Ma,Shuaiqiang Wang,Pengjie Ren,Zhumin Chen,Dawei Yin,Zhaochun Ren
発行日	2023-10-27 12:11:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー