CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning

要約

大規模な言語モデル（LLMS）は、リレーショナルデータベースに関する自然言語の質問をSQLクエリに翻訳する際の強力な能力を実証しています。
特に、自己整合や自己修正などのテスト時間スケーリング手法は、推論中の計算努力を増やすことにより、SQL生成の精度を高めることができます。
ただし、これらの方法には顕著な制限があります。自己整合性は、多数票にもかかわらず最適ではない出力を選択する場合がありますが、自己修正は通常、構文エラーのみに対処します。
両方のアプローチの強みを活用するために、CSC-SQLを提案します。これは、自己整合性と自己修正を統合する新しい方法です。
CSC-SQLは、並列サンプリングから最も頻繁に発生する2つの出力を選択し、修正のためにそれらをマージ改訂モデルに送ります。
さらに、グループ相対ポリシー最適化（GRPO）アルゴリズムを採用して、補強学習を介してSQL生成モデルと改訂モデルの両方を微調整し、出力品質を大幅に向上させます。
実験結果は、CSC-SQLの有効性と一般化可能性を確認します。
鳥の発達セットでは、3Bモデルは65.28％の実行精度を達成し、7Bモデルは69.19％を達成します。
コードはhttps://github.com/cycloneboy/csc_sqlで開かれます。

要約(オリジナル)

Large language models (LLMs) have demonstrated strong capabilities in translating natural language questions about relational databases into SQL queries. In particular, test-time scaling techniques such as Self-Consistency and Self-Correction can enhance SQL generation accuracy by increasing computational effort during inference. However, these methods have notable limitations: Self-Consistency may select suboptimal outputs despite majority votes, while Self-Correction typically addresses only syntactic errors. To leverage the strengths of both approaches, we propose CSC-SQL, a novel method that integrates Self-Consistency and Self-Correction. CSC-SQL selects the two most frequently occurring outputs from parallel sampling and feeds them into a merge revision model for correction. Additionally, we employ the Group Relative Policy Optimization (GRPO) algorithm to fine-tune both the SQL generation and revision models via reinforcement learning, significantly enhancing output quality. Experimental results confirm the effectiveness and generalizability of CSC-SQL. On the BIRD development set, our 3B model achieves 65.28% execution accuracy, while the 7B model achieves 69.19%. The code will be open sourced at https://github.com/CycloneBoy/csc_sql.

arxiv情報

著者	Lei Sheng,Shuai-Shuai Xu
発行日	2025-05-19 15:52:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー