LogProber: Disentangling confidence from contamination in LLM responses

要約

機械学習では、汚染とは、テストデータがトレーニングセットに漏れている状況を指します。
この問題は、一般的にGargantuanで訓練されている大規模な言語モデル（LLMS）のパフォーマンスの評価に特に関連しています。
したがって、汚染を検出するためのツールの開発は、LLMSのパフォーマンスの進化を公正かつ適切に追跡できるために重要です。
現在までに、ベンチマークによく見られるような短いテキストシーケンスの汚染の定量化と検出の問題に対処しようとした最近の研究はわずかです。
ただし、これらの方法には、時には非現実的なものになる可能性がある制限があります。本書では、質問ではなく、質問に親しみやすさに焦点を合わせてこれらの欠点のいくつかに取り組むブラックボックス設定で汚染を検出できるように示す斬新で効率的なアルゴリズムを紹介します。
ここでは、同時アプローチと比較して提案された方法の特性を調査し、その利点と制限を特定し、検出アルゴリズムの設計に応じて、異なる形態の汚染がどのように検出されないかを示します。

要約(オリジナル)

In machine learning, contamination refers to situations where testing data leak into the training set. The issue is particularly relevant for the evaluation of the performance of Large Language Models (LLMs), which are generally trained on gargantuan, and generally opaque, corpora of text scraped from the world wide web. Developing tools to detect contamination is therefore crucial to be able to fairly and properly track the evolution of the performance of LLMs. To date, only a few recent studies have attempted to address the issue of quantifying and detecting contamination in short text sequences, such as those commonly found in benchmarks. However, these methods have limitations that can sometimes render them impractical.In the present paper, we introduce LogProber, a novel, efficient algorithm that we show to be able to detect contamination in a black box setting that tries to tackle some of these drawbacks by focusing on the familiarity with the question rather than the answer. Here, we explore the properties of the proposed method in comparison with concurrent approaches, identify its advantages and limitations, and illustrate how different forms of contamination can go undetected depending on the design of the detection algorithm.

arxiv情報

著者	Nicolas Yax,Pierre-Yves Oudeyer,Stefano Palminteri
発行日	2025-06-11 14:42:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LogProber: Disentangling confidence from contamination in LLM responses

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー