Confidence Estimation for Error Detection in Text-to-SQL Systems

要約

Text-to-SQL により、ユーザーは自然言語を通じてデータベースと対話できるようになり、情報の検索と合成が簡素化されます。
大規模言語モデル (LLM) は自然言語の質問を SQL クエリに変換することに成功しましたが、その広範な採用は 2 つの主な課題によって制限されています。それは、多様なクエリにわたって堅牢な一般化を達成することと、その予測における解釈上の信頼性を確保することです。
これらの問題に取り組むために、私たちの研究では、Text-to-SQL システムへの選択的分類子の統合を調査しています。
選択的分類子を使用したエントロピーベースの信頼推定を使用してカバレッジとリスクの間のトレードオフを分析し、Text-to-SQL モデルの全体的なパフォーマンスに対するその影響を評価します。
さらに、モデルの初期キャリブレーションを調査し、キャリブレーション手法を使用してモデルの信頼性と精度の調整を改善します。
私たちの実験結果は、エンコーダー-デコーダー T5 がコンテキスト学習 GPT 4 やデコーダーのみの Llama 3 よりも適切に調整されているため、指定された外部エントロピーベースの選択分類器のパフォーマンスが優れていることを示しています。
この研究では、エラー検出の観点から、選択的分類器の方が、間違ったクエリ生成ではなく、無関係な質問に関連するエラーをより高い確率で検出することも明らかになりました。

要約(オリジナル)

Text-to-SQL enables users to interact with databases through natural language, simplifying the retrieval and synthesis of information. Despite the success of large language models (LLMs) in converting natural language questions into SQL queries, their broader adoption is limited by two main challenges: achieving robust generalization across diverse queries and ensuring interpretative confidence in their predictions. To tackle these issues, our research investigates the integration of selective classifiers into Text-to-SQL systems. We analyse the trade-off between coverage and risk using entropy based confidence estimation with selective classifiers and assess its impact on the overall performance of Text-to-SQL models. Additionally, we explore the models’ initial calibration and improve it with calibration techniques for better model alignment between confidence and accuracy. Our experimental results show that encoder-decoder T5 is better calibrated than in-context-learning GPT 4 and decoder-only Llama 3, thus the designated external entropy-based selective classifier has better performance. The study also reveal that, in terms of error detection, selective classifier with a higher probability detects errors associated with irrelevant questions rather than incorrect query generations.

arxiv情報

著者	Oleg Somov,Elena Tutubalina
発行日	2025-01-16 13:23:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Confidence Estimation for Error Detection in Text-to-SQL Systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー