Pandas处理导入大数据量CSV Excel数据到Oracle、MySQL数据库详细流程

最新推荐文章于 2024-06-13 18:25:04 发布

WindOfMayGIS

最新推荐文章于 2024-06-13 18:25:04 发布

阅读量1.9k

点赞数 1

分类专栏： pythonic技术汇总

本文链接：https://blog.youkuaiyun.com/zhouxinxin111/article/details/107080784

版权

pythonic技术汇总专栏收录该内容

12 篇文章

订阅专栏

Pandas处理导入大数据量CSV Excel数据到Oracle数据库详细流程

概述
代码

概述

大数据量的CSV文件导入到Oracle获取表空间中，是各类系统应用常见的、常用的能力。经常会遭遇的问题包括：字段无法对应、数据读入不全等，故此将现有处理过程中遇到的问题总结。

代码

第一步：打开CSV，构建dataframe对象

import pandas as pd
df1 = pd.read_csv('G:\\js_2018_poi.csv',encoding='utf8')
df1.head(10)

第二步：连接数据库

from sqlalchemy import create_engine,types
conn_string='oracle+cx_oracle://用户名:密码@localhost:1521/orcl'

第三步：设置表的字段类型

#参考：https://blog.csdn.net/baidu_39148260/article/details/103341108
#设置写入类型，不然默认是用CLOB类型写入，内置的类型转换很慢，小量数据无所谓
dtyp = {c:types.VARCHAR(df1[c].str.len().max()) for c in df1.columns[df1.dtypes == 'object'].tolist()}
print(dtyp)

第四步：写入数据库

#不设置to_sql的方法，设置写入类型
df1.to_sql('JIANGSU_POI_2018', con=engine, if_exists='append', index=False, index_label=None, dtype=dtyp)
#关闭引擎
engine.dispose()

其他：查看超长信息

# print(df1.iloc[:,0].str.len().max())
series_row=df1.iloc[:,0].str.len()
count=0
for index, row in series_row.items():
    if row>1800:
        print (index)
        print(row)
        count=count+1
print(count)