在工作中需要将dbf文件导入到oracle数据库中,包含有两个文件,一个文件是用于读取dbf并生成pandas,文件名为simpledbf.py,该文件修改自simpledbf中的simpledbf文件,该文件中没有对number数据类型进行区分是int行还是float型,修改后的simpledbf进行了区分。另一个文件为dbf_import.py,通过调用该文件可以实现数据导入到数据库中。该文件基本思路如下:
- 配置必须的项目项;
- 读取dbf文件生成csv文件;
- 根据dbf文件的字段信息创建表;
- 导入csv文件到创建的表中;
- 校验dbf文件行数以及导入到数据库以后的行数
该程序的不足之处:
- 该文件只处理了字符串、数字类型;没有对日期等其他的数据类型进行进一步处理,而是统一转换成了varchar2(200)类型。
相关配置内容如下【位于dbf_import.py文件中】:
#########################################################################
#只需要修改如下的参数
#参数定义 需要柜台首先创建指定用户下的表信息
#output_path最后不需要加反斜杠
dict_value={'schema_name':'db_user','table_name':'table_name','dbf_name':r'D:\dbf_name.DBF','output_path':r'd:\path_of_dir'}
#如果是全部的字段则可以写成import_cols='*'
#如果是选择部分字段类型,可以定义为list类型,大小写无关,会统一转换为小写
import_cols= ['ymth','khrq','zjlb','zjdm']
#import_cols='*'
#指定连接数据库的用户名密码以及数据库的编码,用户名默认为dict_value中的schema_name
oracle_source=(dict_value['schema_name'],'password','db_name','GB18030')
#如果dbf文件中的列含有数据库中的关键字,则需要增加前缀来避免问题的出现
column_prefix='st_'
#########################################################################
simpledbf.py文件内容如下:
import struct
import datetime
import os
import codecs
# Check for optional dependencies.
try:
import pandas as pd
except:
print("Pandas is not installed. No support for DataFrames, HDF, or SQL.")
else:
try:
import sqlalchemy as sql
except:
print("SQLalchemy is not installed. No support for SQL output.")
class DbfBase(object):
'''
Base class for DBF file processing objects.
Do not instantiate this class. This provides some of the common functions
for other subclasses.
'''
def _chunker(self, chunksize):
'''Return a list of chunk ints from given chunksize.
Parameters
----------
chunksize : int
The maximum chunk size
Returns
-------
list of ints
A list of chunks necessary to break up a given file. These will
all be equal to `chunksize`, except for the last value, which is
the remainder (<= `chunksize).
'''
num = self.numrec//chunksize
# Chunksize bigger than numrec
if num == 0:
return [self.numrec,]
else:
chunks = [chunksize,]*num
remain = self.numrec%chunksize
if remain != 0:
chunks.append(remain)
return chunks
def _na_set(self, na):
'''Set the value used for missing/bad data.
Parameters
----------
na : various types accepted
The value that will be used to replace missing or malformed
entries. Right now this accepts pretty much anything, and that
value will be used as a replacement. (May not do what you expect.)
However, the strings 'na' or 'nan' (case insensitive) will insert
float('nan'), the string 'none' (case insensitive) or will insert
the Python object `None`. Float/int columns are always
float('nan') regardless of this setting.
'''
if na.lower() == 'none':
self._na = None
elif na.lower() in ('na', 'nan'):
self._na = float('nan')
else:
self._na = na
def to_dataframe(self, chunksize=None, na='nan')