CM-SQL: A cross-model consistency framework for text-to-SQL.

Saved in:
Bibliographic Details
Title: CM-SQL: A cross-model consistency framework for text-to-SQL.
Authors: Li, Xiang1 (AUTHOR) li_xiang@stu.kust.edu.cn, You, Jinguo1 (AUTHOR) jgyou@kust.edu.cn, Li, Heng2 (AUTHOR) liheng@csu.edu.cn, Peng, Jun1 (AUTHOR) junpeng@stu.kust.edu.cn, Chen, Xi1 (AUTHOR) chenxi328@stu.kust.edu.cn, Guo, Ziheng1 (AUTHOR) wzskqjh@stu.kust.edu.cn
Source: Neurocomputing. Dec2025, Vol. 658, pN.PAG-N.PAG. 1p.
Subjects: SQL, Language models, Database design, Data integration
Abstract: In recent years, large language models (LLMs) have been widely applied to the task of Text-to-SQL. Currently, most LLM-based Text-to-SQL methods primarily adopt the following approaches to improve the accuracy of generated SQL: (1) schema linking; and (2) leveraging the model's self-consistency to check, modify, and select the generated SQL. However, due to issues such as hallucinations in LLMs, the database schema generated during the schema linking phase may contain errors or omissions. On the other hand, LLMs often exhibit overconfidence when evaluating the correctness of their outputs. To address these issues, we propose a cross-model consistency SQL generation framework (CM-SQL), which generates SQL outputs from different perspectives by feeding two database schemas into two LLMs. The framework combines the stability of fine-tuned models with the powerful reasoning capabilities of LLMs to evaluate the generated SQL. Additionally, we propose a local modification strategy to correct erroneous SQL. Finally, the outputs of the evaluation module and the LLM are used to select candidate SQLs, yielding the final SQL. We evaluated the proposed framework on the BIRD dev dataset using GPT-4o-mini and DeepSeek-V2.5, achieving an execution accuracy of 65.65 %. On the test set of the Spider dataset, the execution accuracy reached 87.6%, significantly outperforming most methods based on the same LLMs. Furthermore, our performance is comparable to many approaches that rely on more expensive models, such as GPT-4. • This paper proposes a method for assessing the accuracy of SQL queries using cross-model consistency. The method leverages the reasoning capabilities of multiple large language models (LLMs) and the stability of models fine-tuned with instructions to evaluate the accuracy of SQL queries. This approach effectively mitigates the issue of "overconfidence" exhibited by LLMs when faced with syntactically correct but semantically incorrect SQL queries, a problem often caused by hallucinations and other issues. • This study proposes a localized SQL modification strategy to correct problematic SQL queries. Unlike existing methods, this approach utilizes the output of the SQL Check Module to modify only the erroneous parts of the SQL query, rather than performing a complete regeneration. This effectively reduces the risk of inadvertently introducing errors into originally correct parts of the SQL query due to an excessive number of modifications. • We introduce the V-Schema database schema organization structure. Compared to traditional Database Definition Languages (DDL) and their simplified versions, this architecture not only retains the advantage of DDL schemas in fully representing the relationships and relevant information between different parts of the database but also effectively reduces the impact of unnecessary input on model performance. • This study uses the linked schema (Simplify-Schema) and the unlinked schema (Full-Schema) as inputs to obtain SQL queries from different perspectives. Compared to traditional methods, which provide only a single database schema and rely on altering parameters like temperature or the order of schemas to generate diverse SQL queries, our approach avoids exacerbating hallucination issues that can arise with high temperature settings. Moreover, it fully leverages the advantages of both schemas to generate more robust SQL outputs. [ABSTRACT FROM AUTHOR]
Copyright of Neurocomputing is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Be the first to leave a comment!
You must be logged in first