修改postgres_proto源代码将原始sql传入自定义处理器的query_tables函数实现查询duckdb真实数据

根据postgres_proto文档描述的select语句处理过程，

子类化 PostgresRequestHandler
PostgresRequestHandler 覆盖了 PostgresServerFlowMixin 中的 execute_query() 方法以提供 SQL 语句处理。
它与 QueryInformationSchemaMixin 和 QueryPostgresBuiltinsMixin 集成，以确保正确处理这些表。

当调用 execute_query() 时，SQL 查询使用 postgres_proto.sql.parser.parse_sql() 进行解析，该解析器提供 stmt_type 和 stmt_info。
有关更多详细信息，请参见 parse_sql()。

解析后，将调用与语句类型关联的处理程序。

SELECT 语句已被处理。当收到 SELECT 语句时，将调用 query_tables()。您必须覆盖此函数。
query_tables() 必须返回一个元组，其中第一项是字典列表（行），第二项是名称列表（列名）。

截获传入解析器的原始sql文本，然后根据报错信息逐步在调用链源代码中增加这个文本作为返回值或参数，一直传递到自定义处理器的query_tables函数，成功绕过解析器解析后的内容，把原始sql传给了duckdb，从而实现任意语句查询，对于duckdb报出的sql错误，也能由消息返回

对调用链源代码修改
parse.py

def parse_sql(sql, stmt_type_delimiters=None):
    stmt_type, parts = split_sql(sql, stmt_type_delimiters)
    stmt_types = {
        'SELECT': transform_select_stmt
    }
    return stmt_type, stmt_types[stmt_type](parts) if stmt_type in stmt_types else parts,sql #将sql附在解析结果后面

报错

  File "/par/pypo/postgres_proto/socket_handler/__init__.py", line 30, in execute_query
    stmt_type, stmt_info = self.parse_sql(query)
    ^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

对上述__init__.py修改


    def execute_query(self, query):
        stmt_type, stmt_info,sql = self.parse_sql(query)  #变成读取3个返回值，其实后来看不需要，这里的query就是原始sql

        if self.is_postgres_builtins_query(stmt_type, stmt_info):
            return self.handle_postgres_builtins_query(stmt_type, stmt_info)

        if self.is_information_schema_query(stmt_type, stmt_info):
            return self.handle_information_schema_query(stmt_type, stmt_info)

        handler = self.get_stmt_handler(stmt_type)
        if not handler:
            if stmt_type in self.ignore_missing_statement_types:
                return stmt_type, None, None
            raise PostgresError('statement type not supported')
        rows, cols = handler(stmt_info,sql) #给处理器传入2个参数，实际处理器还有self参数，所以下面报错说3个
        return stmt_type, rows, cols

报错

  File "/par/pypo/postgres_proto/socket_handler/__init__.py", line 43, in execute_query
    rows, cols = handler(stmt_info,sql)
                 ^^^^^^^^^^^^^^^^^^^^^^
TypeError: PostgresRequestHandler.handle_select() takes 2 positional arguments but 3 were given

继续修改__init__.py

    @stmt_handler('SELECT')
    def handle_select(self, stmt_info, query):
        if '*' in [c.name for c in stmt_info.columns] and (len(stmt_info.columns) > 1 or stmt_info.columns[0].alias):
            raise PostgresError('select * cannot be aliased or used with other columns')

        data, cols = self.query_tables(stmt_info, query)
        return format_select_results(data, cols, stmt_info)

自定义处理器程序

from postgres_proto.socket_handler import PostgresRequestHandler
from postgres_proto.flow import PostgresError, catch_all_as_postgres_error_context
import duckdb

class DuckDBRequestHandler(PostgresRequestHandler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        #self.connection = duckdb.connect('demo.ddb')
    
    def query_tables(self, stmt_info,sql): #改成3个参数
        with catch_all_as_postgres_error_context():
            query = str(stmt_info)
            connection = duckdb.connect('/par/demo.ddb')
            print("sql========",sql)
            result = connection.execute(sql) #原始sql传给了duckdb
            columns = [desc[0] for desc in result.description]
            rows_data = result.fetchall()
            
            # 转换为JSON格式的字典列表
            rows = []
            for row in rows_data:
                row_dict = {}
                for i, value in enumerate(row):
                    row_dict[columns[i]] = value
                rows.append(row_dict)
            print(rows, columns)
            return rows, columns
    
    def list_tables(self):
        with catch_all_as_postgres_error_context():
            result = self.connection.execute("SHOW TABLES")
            return [row[0] for row in result.fetchall()]
    
    def describe_table(self, table_name):
        with catch_all_as_postgres_error_context():
            result = self.connection.execute(f"DESCRIBE {table_name}")
            return [row[0] for row in result.fetchall()]

if __name__ == '__main__':
    from postgres_proto.server import start_server, cli_arg_parser
    start_server(DuckDBRequestHandler, **vars(cli_arg_parser.parse_args()))

测试消息文本文件

'Q'	"SELECT * FROM foods where category='meat'"
'Y'
'Q'	"SELECT category，fats_g FROM foods where sugars_g<1"
'Y'

服务端

/par/pypo/examples# PYTHONPATH=.. python3 duck2.py
Serving on 127.0.0.1:55432
sql======== SELECT * FROM foods where category='meat'
[{'category': 'meat', 'calories': 100, 'fats_g': 5.0, 'sugars_g': 0}, {'category': 'meat', 'calories': 120, 'fats_g': 10.0, 'sugars_g': 1}, {'category': 'meat', 'calories': 110, 'fats_g': 7.0, 'sugars_g': 0}, {'category': 'meat', 'calories': 110, 'fats_g': 7.0, 'sugars_g': 0}, {'category': 'meat', 'calories': 100, 'fats_g': 7.0, 'sugars_g': 0}] ['category', 'calories', 'fats_g', 'sugars_g']

sql======== SELECT category,fats_g FROM foods where sugars_g<1
[{'category': 'seafood', 'fats_g': 5.0}, {'category': 'meat', 'fats_g': 5.0}, {'category': 'seafood', 'fats_g': 5.0}, {'category': 'fruit', 'fats_g': 4.5}, {'category': 'meat', 'fats_g': 7.0}, {'category': 'seafood', 'fats_g': 5.0}, {'category': 'seafood', 'fats_g': 10.0}, {'category': 'meat', 'fats_g': 7.0}, {'category': 'seafood', 'fats_g': 1.5}, {'category': 'meat', 'fats_g': 7.0}] ['category', 'fats_g']

客户端

src/pgproto -h 127.0.0.1 -p 55432 -f /par/pgpo2.txt
FE=> Query (query="SELECT * FROM foods where category='meat'")
<= BE RowDescription
字段数量: 4
字段 1: category
字段 2: calories
字段 3: fats_g
字段 4: sugars_g
<= BE DataRow
meat|100|5.0|0
<= BE DataRow
meat|120|10.0|1
<= BE DataRow
meat|110|7.0|0
<= BE DataRow
meat|110|7.0|0
<= BE DataRow
meat|100|7.0|0
<= BE CommandComplete(SELECT)
<= BE ReadyForQuery(I)
FE=> Query (query="SELECT category,fats_g FROM foods where sugars_g<1")
<= BE RowDescription
字段数量: 2
字段 1: category
字段 2: fats_g
<= BE DataRow
seafood|5.0
<= BE DataRow
meat|5.0
<= BE DataRow
seafood|5.0
<= BE DataRow
fruit|4.5
<= BE DataRow
meat|7.0
<= BE DataRow
seafood|5.0
<= BE DataRow
seafood|10.0
<= BE DataRow
meat|7.0
<= BE DataRow
seafood|1.5
<= BE DataRow
meat|7.0
<= BE CommandComplete(SELECT)
<= BE ReadyForQuery(I)

如果表不存在，则报错，这个错误来自duckdb

src/pgproto -h 127.0.0.1 -p 55432 -f /par/pgpo2.txt
FE=> Query (query="SELECT * FROM foods where category='meat'")
<= BE ErrorResponse(S ERROR C 0 M Catalog Error: Table with name foods does not exist!
Did you mean "pg_constraint"?

LINE 1: SELECT * FROM foods where category='meat'
                      ^ )
<= BE ReadyForQuery(I)

如果是在python交互环境执行，效果如下

>>> sql="SELECT * FROM food where category='meat'"
>>> result = connection.execute(sql)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
duckdb.duckdb.CatalogException: Catalog Error: Table with name food does not exist!
Did you mean "foods"?

LINE 1: SELECT * FROM food where category='meat'
                      ^