Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness

要約

BSDetector を紹介します。これは、生成された出力に対する数値の信頼スコアを推定することにより、事前トレーニング済みの大規模言語モデルから不適切な推測的な回答を検出するためのメソッドです。
私たちの不確実性定量化手法は、トレーニングデータが不明のまま、ブラックボックス API 経由でのみアクセス可能な LLM に対して機能します。
追加の計算を少し費やすことにより、LLM API のユーザーは、通常と同じ応答を取得できるだけでなく、この応答を信頼すべきでない場合に警告する信頼度の推定も取得できるようになります。
クローズドフォームとオープンフォームの両方の質問と回答のベンチマークでの実験により、BSDetector が代替の不確実性推定手順 (GPT-3 と ChatGPT の両方) よりも不正確な LLM 応答をより正確に識別することが明らかになりました。
LLM から複数の応答をサンプリングし、最も高い信頼スコアを持つ応答を考慮することで、追加のトレーニング手順を行わずに、同じ LLM からより正確な応答をさらに取得できます。
LLM による自動評価を伴うアプリケーションでは、信頼スコアを考慮することで、人間参加型設定と完全自動設定 (GPT 3.5 と 4 の両方) の両方で、より信頼性の高い評価が可能になります。

要約(オリジナル)

We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM accessible only via a black-box API, whose training data remains unknown. By expending a bit of extra computation, users of any LLM API can now get the same response as they would ordinarily, as well as a confidence estimate that cautions when not to trust this response. Experiments on both closed and open-form Question-Answer benchmarks reveal that BSDetector more accurately identifies incorrect LLM responses than alternative uncertainty estimation procedures (for both GPT-3 and ChatGPT). By sampling multiple responses from the LLM and considering the one with the highest confidence score, we can additionally obtain more accurate responses from the same LLM, without any extra training steps. In applications involving automated evaluation with LLMs, accounting for our confidence scores leads to more reliable evaluation in both human-in-the-loop and fully-automated settings (across both GPT 3.5 and 4).

arxiv情報

著者	Jiuhai Chen,Jonas Mueller
発行日	2023-10-04 15:05:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー