深入PostgreSQL与Python类型适配:Psycopg全攻略

深入PostgreSQL与Python类型适配:Psycopg全攻略

【免费下载链接】psycopg New generation PostgreSQL database adapter for the Python programming language 【免费下载链接】psycopg 项目地址: https://gitcode.com/gh_mirrors/ps/psycopg

引言:当Python遇见PostgreSQL

你是否曾在使用Python操作PostgreSQL时遇到过类型不匹配的问题?比如Python的datetime对象如何正确映射到PostgreSQL的timestamp类型?或者如何将复杂的Python数据结构高效地存储到数据库中?Psycopg作为Python与PostgreSQL之间的桥梁,提供了强大而灵活的类型适配机制,让这些问题迎刃而解。

读完本文,你将能够:

  • 深入理解Psycopg类型适配的核心原理
  • 掌握内置类型的适配规则与使用技巧
  • 自定义类型适配器以满足特定需求
  • 优化类型转换性能,避免常见陷阱
  • 解决复杂数据类型(如数组、JSON、枚举)的适配问题

一、Psycopg类型适配机制总览

1.1 核心概念与架构

Psycopg的类型适配系统基于两大核心组件:DumperLoader。Dumper负责将Python对象转换为PostgreSQL可理解的格式,而Loader则将PostgreSQL返回的数据转换回Python对象。这一过程由Transformer协调,根据上下文选择合适的适配器。

mermaid

1.2 适配流程详解

  1. 适配上下文管理:每个连接或游标都有一个AdaptersMap对象,管理类型与适配器的映射关系。
  2. 类型解析:根据Python类型或PostgreSQL OID查找合适的适配器。
  3. 格式选择:支持文本和二进制两种格式,二进制格式通常更高效。
  4. 递归适配:复杂类型(如数组、复合类型)会递归应用适配规则。
# 适配上下文示例
from psycopg import connect

conn = connect("dbname=test user=postgres")
adapters = conn.adapters  # 获取当前连接的AdaptersMap

# 查看已注册的适配器
print(adapters._dumpers)  # Python类型到Dumper的映射
print(adapters._loaders)  # PostgreSQL OID到Loader的映射

二、核心组件深度解析

2.1 Dumper:Python到PostgreSQL的转换

Dumper是实现类型转换的核心接口,主要方法包括:

  • dump(obj): 将Python对象转换为字节流
  • quote(obj): 生成适合SQL语句的带引号字符串
  • get_key(obj, format): 为特定对象和格式生成缓存键
  • upgrade(obj, format): 根据对象内容升级为更具体的Dumper
# Dumper基类简化实现
class Dumper:
    oid: int = 0  # PostgreSQL类型OID
    format: pq.Format = pq.Format.TEXT  # 适配格式
    
    def __init__(self, cls, context):
        self.cls = cls
        self.connection = context.connection if context else None
    
    @abstractmethod
    def dump(self, obj) -> Buffer | None:
        ...
    
    def quote(self, obj) -> Buffer:
        # 默认实现,处理NULL和字符串转义
        ...

2.2 Loader:PostgreSQL到Python的转换

Loader负责将数据库返回的字节流转换为Python对象,核心方法:

  • load(data): 将字节流转换为Python对象
# Loader基类简化实现
class Loader:
    format: pq.Format = pq.Format.TEXT  # 适配格式
    
    def __init__(self, oid, context):
        self.oid = oid
        self.connection = context.connection if context else None
    
    @abstractmethod
    def load(self, data) -> Any:
        ...

2.3 AdaptersMap:适配器管理中心

AdaptersMap负责管理类型与适配器的映射关系,提供注册和查找适配器的方法:

# AdaptersMap核心方法
class AdaptersMap:
    def register_dumper(self, cls, dumper):
        # 注册Python类型到Dumper的映射
        ...
    
    def register_loader(self, oid, loader):
        # 注册PostgreSQL OID到Loader的映射
        ...
    
    def get_dumper(self, cls, format):
        # 获取指定类型和格式的Dumper
        ...
    
    def get_loader(self, oid, format):
        # 获取指定OID和格式的Loader
        ...

三、内置类型适配详解

3.1 数值类型适配

Psycopg为Python数值类型提供了灵活的适配策略:

Python类型默认PostgreSQL类型适配规则
int根据值大小自动选择smallint/integer/bigint选择最小可用类型
floatfloat8直接映射
Decimalnumeric高精度映射
numpy.int64bigint二进制高效转换
# 数值类型适配示例
import psycopg
from decimal import Decimal

with psycopg.connect("dbname=test") as conn:
    with conn.cursor() as cur:
        # int适配:自动选择合适的PostgreSQL类型
        cur.execute("SELECT %s, %s, %s", (1, 1000000, 1000000000000))
        print(cur.description[0].type_name)  # smallint
        print(cur.description[1].type_name)  # integer
        print(cur.description[2].type_name)  # bigint
        
        # Decimal适配
        cur.execute("SELECT %s", (Decimal("3.1415926535"),))
        print(cur.description[0].type_name)  # numeric

3.2 字符串与二进制类型

字符串和二进制数据的适配需要特别注意编码和性能问题:

# 字符串适配示例
with psycopg.connect("dbname=test") as conn:
    with conn.cursor() as cur:
        # 普通字符串
        cur.execute("SELECT %s", ("Hello, 世界",))
        print(cur.fetchone()[0])  # Hello, 世界
        
        # 二进制数据
        data = b"\x00\x01\x02\x03"
        cur.execute("SELECT %b", (data,))  # 使用%b指定二进制格式
        print(cur.fetchone()[0])  # b'\x00\x01\x02\x03'

3.3 日期时间类型

日期时间类型适配需要注意时区和特殊值处理:

Python类型PostgreSQL类型注意事项
datedate直接映射
datetimetimestamp/timestamptz根据是否带时区自动选择
timetime/timetz仅支持固定偏移时区
timedeltainterval完整支持
# 日期时间类型适配示例
from datetime import date, datetime, timezone, timedelta

with psycopg.connect("dbname=test") as conn:
    with conn.cursor() as cur:
        # 带时区的datetime
        dt = datetime(2023, 1, 1, 12, 0, tzinfo=timezone.utc)
        cur.execute("SELECT %s", (dt,))
        print(cur.description[0].type_name)  # timestamptz
        
        # 时间间隔
        delta = timedelta(days=1, hours=2)
        cur.execute("SELECT %s", (delta,))
        print(cur.fetchone()[0])  # 1 day 02:00:00

3.4 复杂类型适配

3.4.1 数组类型

Psycopg支持Python列表与PostgreSQL数组的双向映射:

# 数组类型适配示例
with psycopg.connect("dbname=test") as conn:
    with conn.cursor() as cur:
        # 一维数组
        cur.execute("SELECT %s", ([1, 2, 3, 4],))
        print(cur.fetchone()[0])  # [1, 2, 3, 4]
        
        # 多维数组
        arr = [[1, 2], [3, 4]]
        cur.execute("SELECT %s", (arr,))
        print(cur.fetchone()[0])  # [[1, 2], [3, 4]]
        
        # 混合类型数组(不推荐)
        mixed = [1, "two", 3.0]
        with pytest.raises(psycopg.DataError):
            cur.execute("SELECT %s", (mixed,))

数组适配的实现细节:

# 数组Dumper核心逻辑(简化版)
class ListDumper(RecursiveDumper):
    def upgrade(self, obj, format):
        # 根据数组元素类型选择合适的数组OID
        if item := self._find_list_element(obj, format):
            sd = self._tx.get_dumper(item, format.from_pq(self.format))
            info = self._get_base_type_info(sd.oid)
            self.oid = info.array_oid or TEXT_ARRAY_OID
        return self
    
    def dump(self, obj):
        # 递归序列化数组元素
        tokens = []
        for item in obj:
            if isinstance(item, list):
                tokens.append(self.dump(item))
            else:
                tokens.append(self._dump_item(item))
        return b"{" + b",".join(tokens) + b"}"
3.4.2 JSON类型

JSON类型适配支持灵活的序列化选项:

# JSON类型适配示例
from psycopg.types.json import Jsonb

data = {
    "name": "Psycopg",
    "features": ["adapter", "async", "json"],
    "version": 2.9
}

with psycopg.connect("dbname=test") as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT %s", (Jsonb(data),))
        result = cur.fetchone()[0]
        print(result["features"][0])  # adapter
        
        # 自定义JSON序列化
        import ujson
        from psycopg.types.json import set_json_dumps
        
        set_json_dumps(ujson.dumps)  # 使用ujson提高性能
        cur.execute("SELECT %s", (Jsonb(data),))
3.4.3 复合类型与枚举

复合类型和枚举类型需要先注册类型信息:

# 复合类型适配示例
from psycopg.types.composite import CompositeInfo, register_composite

with psycopg.connect("dbname=test") as conn:
    # 创建复合类型
    conn.execute("""
        CREATE TYPE person AS (
            name text,
            age integer,
            hobbies text[]
        )
    """)
    
    # 获取类型信息并注册
    info = CompositeInfo.fetch(conn, "person")
    register_composite(info, conn)
    
    # 使用复合类型
    with conn.cursor() as cur:
        person = ("Alice", 30, ["reading", "hiking"])
        cur.execute("SELECT %s", (person,))
        result = cur.fetchone()[0]
        print(result.name)  # Alice
        print(result.age)   # 30
# 枚举类型适配示例
from psycopg.types.enum import EnumInfo, register_enum
from enum import Enum

with psycopg.connect("dbname=test") as conn:
    # 创建枚举类型
    conn.execute("""
        CREATE TYPE mood AS ENUM ('happy', 'sad', 'neutral')
    """)
    
    # 定义Python枚举
    class Mood(Enum):
        HAPPY = 'happy'
        SAD = 'sad'
        NEUTRAL = 'neutral'
    
    # 获取类型信息并注册
    info = EnumInfo.fetch(conn, "mood")
    register_enum(info, conn, Mood)
    
    # 使用枚举类型
    with conn.cursor() as cur:
        cur.execute("SELECT %s", (Mood.HAPPY,))
        result = cur.fetchone()[0]
        print(result)  # Mood.HAPPY
        print(result.value)  # happy

四、自定义类型适配器开发

4.1 自定义Dumper开发步骤

  1. 继承Dumper基类
  2. 实现dump()方法
  3. 注册适配器
# 自定义Dumper示例:将Python集合转换为PostgreSQL数组
from psycopg.adapt import Dumper, PyFormat
from psycopg.types.array import ListDumper

class SetDumper(ListDumper):
    def __init__(self, cls, context):
        super().__init__(list, context)  # 复用ListDumper的逻辑
    
    def dump(self, obj):
        return super().dump(list(obj))  # 将集合转换为列表后序列化

# 注册适配器
with psycopg.connect("dbname=test") as conn:
    conn.adapters.register_dumper(set, SetDumper)
    
    with conn.cursor() as cur:
        cur.execute("SELECT %s", ({1, 2, 3},))
        print(cur.fetchone()[0])  # {1, 2, 3}

4.2 自定义Loader开发

类似地,可以开发自定义Loader处理特定PostgreSQL类型:

# 自定义Loader示例:将PostgreSQL hstore转换为Python字典
from psycopg.adapt import Loader
import re

class HStoreLoader(Loader):
    format = pq.Format.TEXT
    
    def load(self, data):
        if not data:
            return {}
            
        data = bytes(data)
        pairs = re.findall(rb'(".*?"|.*?)\s*=>\s*(".*?"|.*?)(?:,|$)', data)
        result = {}
        
        for key, value in pairs:
            # 去除引号
            if key.startswith(b'"') and key.endswith(b'"'):
                key = key[1:-1].replace(b'\\"', b'"')
            if value.startswith(b'"') and value.endswith(b'"'):
                value = value[1:-1].replace(b'\\"', b'"')
                
            result[key.decode()] = value.decode() if value != b'NULL' else None
            
        return result

# 注册Loader
with psycopg.connect("dbname=test") as conn:
    conn.adapters.register_loader("hstore", HStoreLoader)
    with conn.cursor() as cur:
        cur.execute("SELECT 'a=>1, b=>2'::hstore")
        print(cur.fetchone()[0])  # {'a': '1', 'b': '2'}

4.3 高级适配场景

4.3.1 空字符串转NULL

将空字符串自动转换为NULL:

# 空字符串转NULL示例
from psycopg.types.string import StrDumper

class NullStrDumper(StrDumper):
    def dump(self, obj):
        return super().dump(obj) if obj.strip() else None

with psycopg.connect("dbname=test") as conn:
    conn.adapters.register_dumper(str, NullStrDumper)
    with conn.cursor() as cur:
        cur.execute("SELECT %s, %s", ("", "  "))
        print(cur.fetchone())  # (None, None)
4.3.2 处理PostgreSQL特殊值

处理PostgreSQL的infinity等特殊值:

# 处理无限日期示例
from datetime import date
from psycopg.types.datetime import DateDumper, DateLoader

class InfDateDumper(DateDumper):
    def dump(self, obj):
        if obj == date.max:
            return b"infinity"
        elif obj == date.min:
            return b"-infinity"
        return super().dump(obj)

class InfDateLoader(DateLoader):
    def load(self, data):
        if data == b"infinity":
            return date.max
        elif data == b"-infinity":
            return date.min
        return super().load(data)

with psycopg.connect("dbname=test") as conn:
    conn.adapters.register_dumper(date, InfDateDumper)
    conn.adapters.register_loader("date", InfDateLoader)
    
    with conn.cursor() as cur:
        cur.execute("SELECT %s, %s", (date.max, date.min))
        print(cur.fetchone())  # (datetime.date(9999, 12, 31), datetime.date(1, 1, 1))

五、性能优化与最佳实践

5.1 选择合适的适配格式

二进制格式通常比文本格式更高效:

# 格式选择性能对比
import timeit

setup = """
import psycopg
conn = psycopg.connect("dbname=test")
cur = conn.cursor()
data = "x" * 1024 * 1024  # 1MB数据
"""

text_time = timeit.timeit(
    "cur.execute('SELECT %s', (data,))", setup, number=100)
binary_time = timeit.timeit(
    "cur.execute('SELECT %b', (data,))", setup, number=100)

print(f"Text format: {text_time:.2f}s")
print(f"Binary format: {binary_time:.2f}s")
# 通常二进制格式快2-5倍

5.2 批量操作优化

使用execute_batch和execute_values提高批量操作性能:

# 批量操作优化示例
from psycopg import sql
from psycopg.extras import execute_batch, execute_values

with psycopg.connect("dbname=test") as conn:
    with conn.cursor() as cur:
        cur.execute("CREATE TABLE test_batch (id int, data text)")
        
        # 准备数据
        data = [(i, f"record {i}") for i in range(1000)]
        
        # 普通executemany
        start = time.time()
        cur.executemany("INSERT INTO test_batch VALUES (%s, %

【免费下载链接】psycopg New generation PostgreSQL database adapter for the Python programming language 【免费下载链接】psycopg 项目地址: https://gitcode.com/gh_mirrors/ps/psycopg

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值