text2sql: multi-agent实现思路MAC-SQL

MAC-SQL出自2023年12月的论文《MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL》(github),它是用基于LLM的multi-agent来实现text2sql。

MAC-SQL的整体思路如论文图2所示,由Decomposer、Selector、Refiner三个agent组成,个人觉得除了有multi-agent的思想外,MAC-SQL与DIN-SQL的思路很类似。

在这里插入图片描述

MAC-SQL的三个agent的协作过程如论文算法1所示。
在这里插入图片描述

接下来分别介绍MAC-SQL三个agent的实现:

  • Selector:其作用是schema linking,选择只与问题相关的数据库schema元素,其prompt如下图所示。值得注意的是,论文强调了只有在数据库schema对应的prompt长度超过了长度阈值时才会被激活,否则直接使用原始的数据库schema。

在这里插入图片描述

  • Decomposer: 在生成最终SQL之前生成一系列的中间步骤来提高LLM的推理能力。如论文图2所示意,Decomposer指导LLM将原始复杂问题分解成推理步骤后生成最后的SQL查询。作者使用CoT来生成子问题和其对应的SQL,实现时会先判断用户问题的难易程度,如果用户的问题比较简单,则直接生成SQL;如果用户的问题很复杂,则先生成子问题对应的SQL,逐步得到最终的SQL。在in-context learning时使用了few-shot例子。

    decomposer_prompt = '''
    Given a [Database schema] description, a knowledge [Evidence] and the [Question], you need to use valid SQLite and understand the database and knowledge, and then decompose the question into subquestions for text-to-SQL generation.
    
    When generating SQL, we should always consider constraints:
    [Constraints]
    - In 'SELECT <column>', just select needed columns in the [Question] without any unnecessary column or value - In 'FROM <table>' or 'JOIN <table>', do not include unnecessary table 
    - If use max or min func, 'JOIN <table>' FIRST, THEN use 'SELECT MAX(<column>)' or 'SELECT MIN(<column>)' - If [Value examples] of <column> has 'None' or None, use 'JOIN <table>' or 'WHERE <column> is NOT NULL' is better 
    - If use 'ORDER BY <column> ASC|DESC', add 'GROUP BY <column>' before to select distinct values
    
    ==========
    
    [Database schema]
    # Table: frpm 
    [ 
    	(CDSCode, CDSCode. Value examples: ['01100170109835', '01100170112607'].), 
    	(Charter School (Y/N), Charter School (Y/N). Value examples: [ 1, 0, None] . And 0: N;. 1: Y),
      (Enrollment (Ages 5-17), Enrollment (Ages 5-17). Value examples: [5271.0, 4734.0].), 
      (Free Meal Count (Ages 5-17), Free Meal Count (Ages 5-17). Value examples: [ 3864.0, 2637.0 ].
    And eligible free rate = Free Meal Count / Enrollment) 
    ] 
    # Table: satscores 
    [ 
    	(cds, California Department Schools. Value examples: ['10101080000000', '10101080109991'].), 
    	(sname, school name. Value examples: [ 'None', 'Middle College High', 'John F. Kennedy High', 'Independence High', 'Foothill High'].),
      (NumTstTakr, Number of Test Takers in this school. Value examples: [ 24305, 4942, 1, 0, 280] . And number of test takers in each school),
      (AvgScrMath, average scores in Math. Value examples: [699, 698, 289, None, 492 ] . And average scores in Math), 
      (NumGE1500, Number of Test Takers Whose Total SAT Scores Are Greater or Equal to 1500. Value examples: [ 5837, 2125, 0, None, 191] . And Number of Test Takers Whose Total SAT Scores Are Greater or Equal to 1500. . commonsense evidence:. . Excellence Rate = NumGE1500 / NumTstTakr) 
    ]
    [Foreign keys]
    frpm.'CDSCode' = satscores.'cds'
    [Question]
    List school names of charter schools with an SAT excellence rate over the average.
    [Evidence]
    Charter schools refers to 'Charter School (Y/N)' = 1 in the table frpm; Excellence rate = NumGE1500 / NumTstTakr
    
    Decompose the question into sub questions, considering [Constraints], and generate the SQL after thinking step by step: 
    Sub question 1: Get the average value of SAT excellence rate of charter schools. 
    SQL 
    "'sql 
    SELECT AVG(CAST(T2.'NumGE1500' AS REAL) / T2.'NumTstTakr') 
    	FROM frpm AS T1 
    	INNER JOIN satscores AS T2 
    	ON T1.'CDSCode' = T2.'cds' 
    	WHERE T1.'Charter School (Y/N)' = 1 
    "'
    
    Sub question 2: List out school names of charter schools with an SAT excellence rate over the average. 
    SQL 
    "' sql 
    SELECT T2.'sname' 
    	FROM frpm AS T1 
    	INNER JOIN satscores AS T2 
    	ON T1.'CDSCode' = T2.'cds' 
    	WHERE T2.'sname' IS NOT NULL 
    	AND T1.'Charter School (Y/N)' = 1 
    	AND CAST(T2.'NumGE1500' AS REAL) / T2.'NumTstTakr' > ( 
    		SELECT AVG(CAST(T4.'NumGE1500' AS REAL) / T4.'NumTstTakr')
    		FROM frpm AS T3 
    		INNER JOIN satscores AS T4 
    		ON T3.'CDSCode' = T4.'cds' 
    		WHERE T3.'Charter School (Y/N)' = 1 
    		) 
    "' 
    
    Question Solved. 
    
    ==========
    
    [Database schema]
    # Table: account 
    [ 
    	(account_id, the id of the account. Value examples: [11382, 11362, 2, 1, 2367].), 
    	(district_id, location of branch. Value examples: [77, 76, 2, 1, 39].),
      (frequency, frequency of the acount. Value examples: ['POPLATEK MESICNE', 'POPLATEK TYDNE', 'POPLATEK PO OBRATU'].), 
      (date, the creation date of the account. Value examples: ['1997-12-29', '1997-12-28'].) 
    ] 
    # Table: client 
    [ 
    	(client_id, the unique number. Value examples: [13998, 13971, 2, 1, 2839].), 
    	(gender, gender. Value examples: ['M', 'F']. And F:female . M:male ), 
    	(birth_date, birth date. Value examples: ['1987-09-27', '1986-08-13'].), 
    	(district_id, location of branch. Value examples: [77, 76, 2, 1, 39].) 
    ] 
    # Table: district 
    [ 
    	(district_id, location of branch. Value examples: [77, 76, 2, 1, 39].), 
    	(A4, number of inhabitants . Value examples: ['95907', '95616', '94812'].), 
    	(A11, average salary. Value examples: [12541, 11277, 8114].) ]
    [Foreign keys]
    account.'district_id' = district.'district_id' 
    client.'district_id' = district.'district_id'
    [Question]
    What is the gender of the youngest client who opened account in the lowest average salary branch?[Evidence]
    Later birthdate refers to younger age; A11 refers to average salary 
    
    Decompose the question into sub questions, considering [Constraints], and generate the SQL after thinking step by step: 
    Sub question 1: What is the district_id of the branch with the lowest average salary?
    SQL 
    "' sql 
    SELECT 'district_id' 
    	FROM district 
    	ORDER BY 'A11' ASC 
    	LIMIT 1 
    "'
    Sub question 2: What is the youngest client who opened account in the lowest average salary branch? 
    SQL 
    "' sql 
    SELECT T1.'client_id' 
    	FROM client AS T1 
    	INNER JOIN district AS T2 
    	ON T1.'district_id' = T2.'district_id' 
    	ORDER BY T2.'A11' ASC, T1.'birth_date' DESC 
    	LIMIT 1 
    "'
    
    Sub question 3: What is the gender of the youngest client who opened account in the lowest average salary branch? 
    SQL 
    "' sql 
    SELECT T1.'gender' 
    	FROM client AS T1 
    	INNER JOIN district AS T2 
    	ON T1.'district_id' = T2.'district_id' 
    	ORDER BY T2.'A11' ASC, T1.'birth_date' DESC 
    	LIMIT 1 
    "' 
    Question Solved. 
    
    ==========
    
    [Database schema]
    {desc_str}
    [Foreign keys]
    {fk_str}
    [Question]
    {query}
    [Evidence]
    {evidence} 
    
    
    Decompose the question into sub questions, considering [Constraints], and generate the SQL after thinking step by step:
    '''
    

在这里插入图片描述

  • Refiner:其作用是为了检测到自动校正SQL错误,如论文图4所示意。如论文图2所示,在收到一个SQL查询之后,Refiner诊断SQL来评估器语法准确性、执行可行性,并从数据库中检索到非空结果。

    refiner_prompt = '''
    [Instruction]
    When executing SQL below, some errors occurred, please fix up SQL based on query and database info. Solve the task step by step if you need to. Using SQL format in the code block, and indicate script type in the code block. When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
    [Constraints]
    - In 'SELECT <column>', just select needed columns in the [Question] without any unnecessary column or value 
    - In 'FROM <table>' or 'JOIN <table>', do not include unnecessary table 
    - If use max or min func, 'JOIN <table>' FIRST, THEN use 'SELECT MAX(<column>)' or 'SELECT MIN(<column>)' - If [Value examples] of <column> has 'None' or None, use 'JOIN <table>' or 'WHERE <column> is NOT NULL' is better 
    - If use 'ORDER BY <column> ASC|DESC', add 'GROUP BY <column>' before to select distinct values
    [Query]
    {query}
    [Evidence]
    {evidence}
    [Database info]
    {desc_str}
    [Foreign keys]
    {fk_str}
    [old SQL]
    "' sql 
    {sql} 
    "'
    [SQLite error]
    {sqlite_error}
    [Exception class]
    {exception_class}
    
    Now please fixup old SQL and generate new SQL again.[correct SQL]
    '''
    

在这里插入图片描述

除了MAC-SQL思路外,论文作者还尝试构建了一个用于微调LLM的mulit-agent任务指令集,并用这个指令集微调了Code Llama 7B模型得到了开源模型SQL-Llama。

### Microsoft SQL Server 错误 3154 数据库备份不匹配解决方案 错误 3154 是由于尝试恢复的数据库备份与目标数据库的状态不一致引起的。这种状态不一致可能是由多种原因造成的,例如不同的 LSN(日志序列号)、文件路径冲突或其他元数据差异。以下是针对此问题的具体解决方法: #### 单用户模式下修复数据库 为了确保没有其他连接干扰操作过程,需先以单用户模式启动 SQL Server 实例[^1]。具体步骤如下: - 停止所有可能连接到 SQL Server 的服务,包括但不限于 SQL Server Agent- 关闭任何通过 SSMS 或其他工具建立的现有连接。 - 使用命令行参数 `-m` 启动实例,进入单用户模式。 ```bash net stop MSSQLSERVER net start MSSQLSERVER /m ``` 完成以上配置后,可以通过 T-SQL 脚本进一步处理数据库恢复逻辑。 --- #### 自定义存储过程实现灵活恢复 创建一个通用的存储过程来简化复杂的恢复流程是一个不错的选择[^2]。下面提供了一个基于 `p_RestoreDb` 存储过程的核心框架,用于应对不同类型的恢复需求: ```sql CREATE PROCEDURE p_RestoreDb @bkfile NVARCHAR(1000), -- 定义要恢复的备份文件名 @dbname SYSNAME = '', -- 恢复后的数据库名称,默认为备份文件名 @dbpath NVARCHAR(260) = '', -- 恢复后的数据库存放目录,默认为 SQL 默认数据目录 @retype NVARCHAR(10) = 'DB', -- 恢复类型:'DB'=完全恢复;'DBNOR'=差异恢复;'DF'=差异备份恢复;'LOG'=事务日志恢复 @filenumber INT = 1, -- 恢复的文件编号 @overexist BIT = 1, -- 是否覆盖已有数据库 @killuser BIT = 1 -- 是否强制结束用户的活动会话 AS BEGIN DECLARE @sql VARCHAR(MAX); IF (@dbname = '') SET @dbname = PARSENAME(@bkfile, 1); IF (@dbpath = '') SELECT @dbpath = CAST(SERVERPROPERTY('InstanceDefaultDataPath') AS NVARCHAR(260)); IF (@overexist = 1 AND EXISTS (SELECT * FROM sys.databases WHERE name = @dbname)) BEGIN IF (@killuser = 1) EXEC ('ALTER DATABASE [' + @dbname + '] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;'); SET @sql = 'DROP DATABASE [' + @dbname + ']'; PRINT @sql; EXEC (@sql); END; SET @sql = 'RESTORE DATABASE [' + @dbname + '] FROM DISK = ''' + @bkfile + ''' WITH FILE = ' + CONVERT(VARCHAR, @filenumber) + ', MOVE ''LogicalFileName'' TO ''' + @dbpath + '\data.mdf'', REPLACE;'; PRINT @sql; EXEC (@sql); ALTER DATABASE [@dbname] SET MULTI_USER; END; GO ``` 该脚本支持四种主要恢复场景,并允许管理员自定义多个选项,从而减少手动干预的可能性。 --- #### 权限调整与验证 如果在执行过程中遇到权限相关的问题,则需要确认当前登录账户具备足够的控制权访问目标资源[^3]。可通过以下方式检查并授予权限: ```sql -- 查询特定表上的权限情况 USE YourDatabase; GO SELECT permission_name, state_desc FROM fn_my_permissions(NULL, 'OBJECT') WHERE subentity_name = 'YourTable'; -- 授予 INSERT 权限给某用户 GRANT INSERT ON OBJECT::dbo.YourTable TO UserName; ``` 必要时可扩展至更高层次的操作许可范围,比如对整个数据库设置 FULL CONTROL 属性。 --- #### 总结注意事项 尽管上述措施能够显著降低发生错误的概率,但在实际部署前仍建议做好充分测试工作。特别是涉及生产环境的数据迁移项目更应谨慎行事!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值