G²SQL: guided & guarded Text-to-SQL generation with two-stage verification.

Saved in:
Bibliographic Details
Title: G²SQL: guided & guarded Text-to-SQL generation with two-stage verification.
Authors: Li, Xiang1 (AUTHOR) li_xiang@stu.kust.edu.cn, You, Jinguo1 (AUTHOR) jgyou@kust.edu.cn, Li, Heng2 (AUTHOR) liheng@csu.edu.cn, Peng, Jun1 (AUTHOR) junpeng@stu.kust.edu.cn, Chen, Xi1 (AUTHOR) chenxi328@stu.kust.edu.cn, Guo, Ziheng1 (AUTHOR) wzskqjh@stu.kust.edu.cn, Li, Kun3 (AUTHOR) likun75@huawei.com, Xu, Tianyi3 (AUTHOR) xutianyi5@huawei.com
Source: Expert Systems with Applications. May2026, Vol. 311, pN.PAG-N.PAG. 1p.
Subjects: SQL, Inspection & review, Database security, Supervised learning, Language models
Abstract: Despite the remarkable progress of large language models (LLMs) in the Text-to-SQL domain, issues such as model hallucination remain a challenge. During SQL generation, an error at any stage may inevitably influence subsequent outputs, resulting in suboptimal or incorrect SQL queries. Moreover, when using LLMs to verify and revise generated SQL without human supervision, the models may sometimes introduce operations that tamper with or damage the database content. To address these challenges, we propose G²SQL, a supervised SQL generation framework enhanced by two-stage verification. In G²SQL, we adopt a learning-based SQL-Plan feedback loop to inspect and optimize the SQL generation process, aiming to minimize error propagation. After SQL is generated, a Reviewer-Observer mechanism is employed to further validate and revise the queries while ensuring database safety. Extensive experiments on both proprietary and open-source LLMs demonstrate the effectiveness and robustness of G²SQL, achieving a maximum execution accuracy of 73.16% on the BIRD development set and 89.97% on the Spider test set, verifying the efficiency and advancement of the proposed framework. [ABSTRACT FROM AUTHOR]
Copyright of Expert Systems with Applications is the property of Pergamon Press - An Imprint of Elsevier Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:Despite the remarkable progress of large language models (LLMs) in the Text-to-SQL domain, issues such as model hallucination remain a challenge. During SQL generation, an error at any stage may inevitably influence subsequent outputs, resulting in suboptimal or incorrect SQL queries. Moreover, when using LLMs to verify and revise generated SQL without human supervision, the models may sometimes introduce operations that tamper with or damage the database content. To address these challenges, we propose G²SQL, a supervised SQL generation framework enhanced by two-stage verification. In G²SQL, we adopt a learning-based SQL-Plan feedback loop to inspect and optimize the SQL generation process, aiming to minimize error propagation. After SQL is generated, a Reviewer-Observer mechanism is employed to further validate and revise the queries while ensuring database safety. Extensive experiments on both proprietary and open-source LLMs demonstrate the effectiveness and robustness of G²SQL, achieving a maximum execution accuracy of 73.16% on the BIRD development set and 89.97% on the Spider test set, verifying the efficiency and advancement of the proposed framework. [ABSTRACT FROM AUTHOR]
ISSN:09574174
DOI:10.1016/j.eswa.2026.131276