Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup

要約

大規模な言語モデルは、強力なコンテキスト学習機能により、テキストからSQLなどの多くのタスクで優れたパフォーマンスを実証しています。
それらは、テキストからSQLの主流のアプローチになりつつあります。
ただし、これらの方法には、特に複雑な質問に関する人間のパフォーマンスと比較して、依然として大きなギャップがあります。
質問の複雑さが増すと、質問とSQLの間のギャップが増加します。
構造マッピングギャップと語彙マッピングギャップという2つの重要なギャップを特定します。
これらの2つのギャップに取り組むために、LLMSに基づいた効率的なSQL生成パイプラインであるPAS-SQLを提案します。これは、抽象クエリパターン（AQP）とコンテキストスキーママークアップ（CSM）を介してギャップを緩和します。
AQPは、データベース関連の情報を削除することにより、質問の構造パターンを取得することを目的としています。これにより、構造的に類似したデモンストレーションを見つけることができます。
CSMは、質問のデータベース関連のテキストスパンを、字句マッピングのギャップを緩和するデータベース内の特定のテーブルまたは列に関連付けることを目指しています。
クモと鳥のデータセットの実験結果は、提案された方法の有効性を示しています。
具体的には、PAS-SQL + GPT-4Oは、87.9％の実行精度でスパイダーベンチマークに新しい最先端を設定し、実行精度が64.67 \％の鳥データセットで主要な結果を達成します。

要約(オリジナル)

Large language models have demonstrated excellent performance in many tasks, including Text-to-SQL, due to their powerful in-context learning capabilities. They are becoming the mainstream approach for Text-to-SQL. However, these methods still have a significant gap compared to human performance, especially on complex questions. As the complexity of questions increases, the gap between questions and SQLs increases. We identify two important gaps: the structural mapping gap and the lexical mapping gap. To tackle these two gaps, we propose PAS-SQL, an efficient SQL generation pipeline based on LLMs, which alleviates gaps through Abstract Query Pattern (AQP) and Contextual Schema Markup (CSM). AQP aims to obtain the structural pattern of the question by removing database-related information, which enables us to find structurally similar demonstrations. CSM aims to associate database-related text span in the question with specific tables or columns in the database, which alleviates the lexical mapping gap. Experimental results on the Spider and BIRD datasets demonstrate the effectiveness of our proposed method. Specifically, PAS-SQL + GPT-4o sets a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9\%, and achieves leading results on the BIRD dataset with an execution accuracy of 64.67\%.

arxiv情報

著者	Yonghui Kong,Hongbing Hu,Dan Zhang,Siyuan Chai,Fan Zhang,Wei Wang
発行日	2025-02-20 16:11:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー