CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs

要約

大規模な言語モデル（LLMS）の不確実性定量化（UQ）メソッドにはさまざまなアプローチが含まれ、2つの主要なタイプが特に顕著です。トークン確率として表現されたモデル信頼に焦点を当てた情報ベースと、セマンティックを評価する一貫性ベース
繰り返しサンプリングを使用して生成された複数の出力間の関係。
いくつかの最近の方法がこれらの2つのアプローチを組み合わせて、さまざまなアプリケーションで印象的なパフォーマンスを示しています。
ただし、より単純なベースラインメソッドを上回ることができない場合があります。
私たちの調査では、LLMの特性が確率モデルとしての特徴を明らかにしているため、これらのUQメソッドが特定のタスクでパフォーマンスが低下している理由を説明するのに役立ちます。
これらの調査結果に基づいて、モデルの信頼性と出力の一貫性を合成する新しい方法を提案し、効率的で堅牢なUQメソッドのファミリーにつながります。
質問の回答、抽象的な要約、機械翻訳など、さまざまなタスクにわたってアプローチを評価し、最先端のUQアプローチに関するかなりの改善を実証します。

要約(オリジナル)

Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.

arxiv情報

著者	Roman Vashurin,Maiya Goloburda,Preslav Nakov,Artem Shelmanov,Maxim Panov
発行日	2025-02-11 14:32:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー