高速向 postgreSQL数据库中写入数据

最新推荐文章于 2025-11-02 08:53:05 发布

转载最新推荐文章于 2025-11-02 08:53:05 发布 · 1.5k 阅读

CC 4.0 BY-SA版权

原文链接：https://www.cnblogs.com/xo1990/p/15544826.html,https://blog.youkuaiyun.com/weixin_43235307/article/details/122165682,https://blog.youkuaiyun.com/weixin_44731100/article/details/102677927,https://blog.youkuaiyun.com/qq_40659982/article/details/108826788

文章标签：

#数据库 #postgresql

技术综合专栏收录该内容

60 篇文章

订阅专栏

向postgreSQL数据表中写入数据有多种方法，这里记录效率相对比较高的两种。

1、insert into

将要写入的数据组织成 dict 组成的 List，然后一次性写入。

import psycopg2 as pg
import psycopg2.extras as extras
import pandas as pd

from typing import List
from datetime import datetime
from sklearn.datasets import load_iris


def df_to_dict_lst(res_df: pd.DataFrame) -> List:
    res_df = res_df.fillna(' ')
    cols_lst = res_df.columns.tolist()
    dic_lst = []
    for row_ in res_df.values:
        dic = {k: str(v) for k, v in zip(cols_lst, row_)}
        dic_lst.append(dic)
    return dic_lst


def create_table() -> tuple:
    conn = pg.connect(database="xxxx",
                      user="xxxx",
                      password="xxxx",
                      host="xxxx",
                      port="xxxx")

    cur = conn.cursor(cursor_factory=extras.DictCursor)
    sql = """
        CREATE TABLE trial(
            col1 VARCHAR(50),
            col2 VARCHAR(50),
            col3 VARCHAR(50),
            col4 VARCHAR(50)
        );
        """
    cur.execute(sql)
    conn.commit()
    return conn, cur


def data_to_pg(db_cx: any, cursor: any, data_lst: List) -> None:
    sql = 'INSERT INTO trial(col1,col2,col3,col4) VALUES %s ON CONFLICT (col1) DO NOTHING'
    try:
        extras.execute_values(cursor,
                              sql,
                              data,
                              template='(%(col1)s,%(col2)s,%(col3)s,%(col4)s',
                              page_size=len(data_lst))
        db_cx.commit()

        print('数据插入完成')
    except Exception as e:
        cursor.execute('ROLLBACK')
        print('数据插入失败')
        print(f"{type(e)}：{e}")


if __name__ == "__main__":
    data = load_iris()
    data_df = pd.DataFrame(data.data, columns=['col1', 'col2', 'col3', 'col4']).reset_index()
    data_list = df_to_dict_lst(data_df)
    conn, cur = create_table()
    t1 = datetime.now()
    data_to_pg(conn, cur, data_list)
    t2 = datetime.now()
    print(f"用时 {t2 - t1}")

2、copy_from

函数说明：

copy_from(file，table，sep ='\ t'，null ='\\ N'，size = 8192，columns = None)从类似文件的目标文件中读取数据，将它们附加到名为table的表中。
- file：从中读取数据的类文件对象。它必须具有 read()和readline()方法。
- table：要将数据复制到的表的名称。
- sep：文件中预期的列分隔符。默认为选项卡。
- null：NULL文件中的文本表示。默认为两个字符串\N。
- size：用于从文件中读取的缓冲区的大小。
- columns：要导入的列名。长度和类型应与要读取的文件的内容相匹配。如果未指定，则假定整个表与文件结构匹配。最好显式写上，不然有可能出现运行成功但是数据没有写入的情况。

show me the code:

conn = pg.connect(database='xxx',
				  user='xxx',
				  password='xxx',
				  host='xxxx',
				  port='xxxx')
				  
# 将dataframe类型转换为IO缓冲区中的str类型
output = StringIO()
df.to_csv(output, sep='\t', index=False, header=False)
df_output = output.getvalue()

# 连接数据库并写入数据
cur = conn.cursor()
cur.copy_from(StringIO(df_output), 'tablename', null='')     # 添加null属性后，数据空值不会报错

conn.commit()
conn.close()

参考资料：

[1]: postgresql 大数据批量插入数据库实战
https://www.cnblogs.com/xo1990/p/15544826.html
[2]: postgresql批量插入copy_from()的使用
https://blog.youkuaiyun.com/qq_40659982/article/details/108826788
[3]: insert、to_sql、copy_from 3种写入PG数据库代码效率对比
https://blog.youkuaiyun.com/weixin_44731100/article/details/102677927
[4]: psycopg2 copy_from写入数据时，碰到int类型，数据空值无法写入
https://blog.youkuaiyun.com/weixin_43235307/article/details/122165682