dbf文件导入到oracle数据库中

最新推荐文章于 2021-04-14 22:29:41 发布

chuixu6719

最新推荐文章于 2021-04-14 22:29:41 发布

阅读量7.7k

点赞数

文章标签：数据库操作系统 python

原文链接：https://my.oschina.net/u/1403503/blog/2966818

版权

本文介绍如何将dbf文件导入Oracle数据库，通过修改的simpledbf.py读取dbf并生成pandas数据，然后将数据转换为csv，创建数据库表并导入数据。程序仅处理字符串和数字类型，其他如日期转为varchar2(200)。执行流程包括配置、生成csv、创建表、导入数据及行数校验。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在工作中需要将dbf文件导入到oracle数据库中，包含有两个文件，一个文件是用于读取dbf并生成pandas，文件名为simpledbf.py，该文件修改自simpledbf中的simpledbf文件，该文件中没有对number数据类型进行区分是int行还是float型，修改后的simpledbf进行了区分。另一个文件为dbf_import.py，通过调用该文件可以实现数据导入到数据库中。该文件基本思路如下：

配置必须的项目项；
读取dbf文件生成csv文件；
根据dbf文件的字段信息创建表；
导入csv文件到创建的表中；
校验dbf文件行数以及导入到数据库以后的行数

该程序的不足之处：

该文件只处理了字符串、数字类型；没有对日期等其他的数据类型进行进一步处理，而是统一转换成了varchar2(200)类型。

相关配置内容如下【位于dbf_import.py文件中】：

#########################################################################
#只需要修改如下的参数
#参数定义 需要柜台首先创建指定用户下的表信息
#output_path最后不需要加反斜杠
dict_value={'schema_name':'db_user','table_name':'table_name','dbf_name':r'D:\dbf_name.DBF','output_path':r'd:\path_of_dir'} 
#如果是全部的字段则可以写成import_cols='*'
#如果是选择部分字段类型，可以定义为list类型，大小写无关，会统一转换为小写
import_cols= ['ymth','khrq','zjlb','zjdm']
#import_cols='*'
#指定连接数据库的用户名密码以及数据库的编码,用户名默认为dict_value中的schema_name
oracle_source=(dict_value['schema_name'],'password','db_name','GB18030')
#如果dbf文件中的列含有数据库中的关键字，则需要增加前缀来避免问题的出现
column_prefix='st_'
#########################################################################

simpledbf.py文件内容如下：

import struct
import datetime
import os
import codecs

# Check for optional dependencies.
try:
    import pandas as pd
except:
    print("Pandas is not installed. No support for DataFrames, HDF, or SQL.")
else:
    try:
        import sqlalchemy as sql
    except:
        print("SQLalchemy is not installed. No support for SQL output.")

class DbfBase(object):
    '''
    Base class for DBF file processing objects.

    Do not instantiate this class. This provides some of the common functions
    for other subclasses.
    '''
    def _chunker(self, chunksize):
        '''Return a list of chunk ints from given chunksize.

        Parameters
        ----------
        chunksize : int
            The maximum chunk size 

        Returns
        -------
        list of ints
            A list of chunks necessary to break up a given file. These will
            all be equal to `chunksize`, except for the last value, which is
            the remainder (<= `chunksize).
        '''
        num = self.numrec//chunksize
        # Chunksize bigger than numrec
        if num == 0:
            return [self.numrec,]
        else:
            chunks = [chunksize,]*num
            remain = self.numrec%chunksize
            if remain != 0:
                chunks.append(remain) 
            return chunks

    def _na_set(self, na):
        '''Set the value used for missing/bad data.

        Parameters
        ----------
        na : various types accepted
            The value that will be used to replace missing or malformed
            entries. Right now this accepts pretty much anything, and that
            value will be used as a replacement. (May not do what you expect.)
            However, the strings 'na' or 'nan' (case insensitive) will insert
            float('nan'), the string 'none' (case insensitive) or will insert
            the Python object `None`.  Float/int columns are always
            float('nan') regardless of this setting.
        '''
        if na.lower() == 'none':
            self._na = None
        elif na.lower() in ('na', 'nan'):
            self._na = float('nan')
        else:
            self._na = na
          


    def to_dataframe(self, chunksize=None, na='nan')

最低0.47元/天解锁文章