Python的数据处理

读取文件,对四个文件中的数据统一格式,并去重复,然后排序,取出时间最短的前三个数据

数据文件:

      james.txt:

    2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22

      julies.txt:


     mikey.txt:


   

    sarah.txt:


定义一个转换函数,使数据格式统一

def sanitize(time_string):
    # 对列表中的数据项统一格式,即将数据项中的-和:都替换成.便于比较大小
    if "-" in time_string:
        splitter = "-"
    elif ":" in time_string:
        splitter = ":"
    else:
        return time_string
    (mins, second) = time_string.split(splitter)
    return (mins + "." + second)

定义一个打开文件函数,实现文件的打开和读取文件内容

def openFile(fn):
    # 打开文件,并读取文件中的内容
    try:
        with open(fn, "r") as filename:
            file_data = filename.readline()
            return file_data
    # 出现异常,进行异常处理,返回一个空值
    except IOError as error:
        print("文件不存在:" + str(error))
        return None

调用函数,实现所需功能;

import chapter5.sanitize
import chapter5.OPenFile
# james = []
julie = []
sarah = []
mikey = []
# 清除重复数据所需
unique_james = []
unique_julie = []
unique_sarah = []
unique_mikey = []
with open("james.txt", "r") as james_data:
    james_time = james_data.readline()
    """for each_data in james_time.strip().split(","):
        james.append(chapter5.sanitize.sanitize(each_data))
    print(sorted(james))"""
    # 列表推导,对每个列表项完成转换后,再对列表进行排序
    james = sorted(chapter5.sanitize.sanitize(each_data) for each_data in james_time.strip().split(","))
    # 对排好序的列表去重复并选出前三个时间最短的
    for data in james:
        if data not in unique_james:
            unique_james.append(data)
    # 列表的分片访问,从0到2
    print(unique_james[0:3])
with open("julie.txt", "r") as julie_data:
    julie_time = julie_data.readline()
    julie = sorted(chapter5.sanitize.sanitize(each_data) for each_data in julie_time.strip().split(","))
    for data in julie:
        if data not in unique_julie:
            unique_julie.append(data)
    print(unique_julie[0:3])
with open("sarah.txt", "r") as sarah_data:
    sarah_time = sarah_data.readline()
    # 先将处理好的数据转化为集合,去重读之后,在进行排序,就成为一个列表,可以使用列表的分片访问
    julie = sorted(set(chapter5.sanitize.sanitize(each_data) for each_data in sarah_time.strip().split(",")))
    print(julie[0:3])
    """for data in julie:
        if data not in unique_sarah:
            unique_sarah.append(data)
    print(unique_sarah[0:3])"""
with open("mikey.txt", "r") as mikey_data:
    mikey_time = mikey_data.readline()
    mikey = sorted(chapter5.sanitize.sanitize(each_data) for each_data in mikey_time.strip().split(","))
    """for data in mikey:
        if data not in unique_mikey:
            unique_mikey.append(data)
    print(unique_mikey[0:3])"""
    # 用set来去除重复项,set是工厂函数,得到的结果是一个集合,不能使用列表的分片访问,所以要转化成列表
    print(list(set(mikey))[0:3])
# 调用openFile函数实现文件的打开
james_time = chapter5.OPenFile.openFile("james.txt")
james = sorted(set(chapter5.sanitize.sanitize(each_data) for each_data in james_time.strip().split(",")))
print(james[0:3])
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22

结果:


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值