文件批量处理

本文介绍了一种使用Python Pandas库批量读取多个CSV文件的方法,并展示了如何进行数据预处理,包括数据拼接、索引重置及数据写入。通过实际案例,如从不同日期的CSV文件中加载数据并进行后续处理,展示了数据科学项目中的关键步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

import tensorflow as tf
import numpy as np
import csv
import pandas as pd
import os
from sklearn.model_selection import train_test_split

train1 = pd.read_csv('0/2018-10-18.txt',sep='\t',header=None)
train2 = pd.read_csv('0/2018-10-19.txt',sep='\t',header=None)
train3 = pd.read_csv('0/2018-10-20.txt',sep='\t',header=None)
train4 = pd.read_csv('0/2018-10-21.txt',sep='\t',header=None)
train5 = pd.read_csv('0/2018-10-22.txt',sep='\t',header=None)
train6 = pd.read_csv('0/2018-10-23.txt',sep='\t',header=None)
train7 = pd.read_csv('0/2018-10-24.txt',sep='\t',header=None)
train8 = pd.read_csv('1/2018-10-18.txt',sep='\t',header=None)
train9 = pd.read_csv('1/2018-10-19.txt',sep='\t',header=None)
train10 =pd.read_csv('1/2018-10-20.txt',sep='\t',header=None)
train11 =pd.read_csv('1/2018-10-21.txt',sep='\t',header=None)
train12 =pd.read_csv('1/2018-10-22.txt',sep='\t',header=None)
train13 =pd.read_csv('1/2018-10-23.txt',sep='\t',header=None)
train14 =pd.read_csv('1/2018-10-24.txt',sep='\t',header=None)

test1 = pd.read_csv('0/2018-11-06.txt',sep='\t',header=None)
test2 = pd.read_csv('0/2018-11-07.txt',sep='\t',header=None)
test3 = pd.read_csv('1/2018-11-06.txt',sep='\t',header=None)
test4 = pd.read_csv('1/2018-11-07.txt',sep='\t',header=None)

train_0 = pd.concat([train1,train2,train3,train4,train5,train6,train7],axis=0)
train_1 = pd.concat([train8,train9,train10,train11,train12,train13,train14],axis=0)

train_0 = train_0.reset_index(drop=True)
train_1 = train_1.reset_index(drop=True)
#train_x,train_y = train.ix[:,0:10],train.ix[:,11]

test_0 = pd.concat([test1,test2],axis=0)
test_1 = pd.concat([test3,test4],axis=0)

test_0 = test_0.reset_index(drop=True)
test_1 = test_1.reset_index(drop=True)
#test_x,test_y = test.ix[:,0:10],test.ix[:,11]

f = open("test_1.txt",'a')
def func(_x):
    for i in range(_x.shape[0]):
        line = _x.ix[i]
        line = np.array(line)
        line = str(line)
        line = ','.join(line.split())
        line = line[1:-1]
        f.write(line)
        f.write('\n')
func(test_1)
import tensorflow as tf
import numpy as np
import csv
import pandas as pd
import os
from sklearn.model_selection import train_test_split

test =open('a.txt','r')

f1 = open("b.txt",'a')
f2 = open("c.txt",'a')

def func(_x,number):
    for i in range(number):
        line = _x.readline()
        if line[0] == ',':
            line = line[1:-1]
        else:
            line = line[0:-1]
        line1 = line[0:-2]
        line2 = line[-1]
        f1.write(line1)
        f1.write('\n')
        f2.write(line2)
        f2.write('\n')
#func(train,1106039)
func(test,111971)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值