文件读取
编码字符出现问题:
>>> f = open('E:\\testDatabase\\firstread.txt')
>>> f
<_io.TextIOWrapper name='E:\\testDatabase\\firstread.txt' mode='r' encoding='cp936'>
>>> f.read()
Traceback (most recent call last):
File "<pyshell#88>", line 1, in <module>
f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 6: illegal multibyte sequence
设置编码格式(读取成功):
>>> f = open('E:\\testDatabase\\firstread.txt','r',encoding='utf-8')
>>> f.read()
'优快云是全球知名中文IT技术交流平台,创建于1999年,包含原创博客、精品问答、职业培训、技术论坛、资源下载等产品服务,提供原创、优质、完整内容的专业IT技术开发社区.'
文件的写入:
talk.txt文件内容:
郭德纲:"今天我特别开心,今天是五月五端我节"
于谦:"端你节啊?"
郭德纲:"端谁?"
于谦:"端午"
郭德纲:"楚国大夫屈原,五月初五死的.我们应该永远怀念屈原.要是没有屈原,我们怎么能有这天假期呢?我觉得应该再多放几天假"
于谦:"那得死多少人啊?"
======================================
郭德纲:"买狗,得注重血统,我买了一狗,那血统太纯正了"
于谦:"怎么?"
郭德纲:"京巴国美藏獒西施一串"
于谦:"嚯~这狗他们家不定多乱呢!"
=======================================
郭德纲:"我拿着卡就奔银行,我把卡里的两千块钱都取出来,我要挥霍了,我要周游世界,买游艇......"
于谦:"行了,两千块钱周游世界!到大兴你就回来了知道吗?还买游艇"
初始写法:
f = open('talk.txt','r',encoding='utf-8') #相对路径和绝对路径均可
boy = []
girl =[]
count = 1
for each_line in f:
if each_line[:6] != '======':
(role,line_spoken) = each_line.split(':',1)
if role == '郭德纲':
boy.append(line_spoken)
if role == '于谦':
girl.append(line_spoken)
else:
file_name_boy = 'boy_'+str(count) +'.txt'
file_name_girl = 'girl_'+str(count) +'.txt'
boy_file = open(file_name_boy,'w')
girl_file = open(file_name_girl,'w')
boy_file.writelines(boy)
girl_file.writelines(girl)
boy_file.close()
girl_file.close()
boy = []
girl = []
count+=1
file_name_boy = 'D:\\python\\project\\boy_'+str(count) +'.txt'
file_name_girl = 'D:\\python\\project\\girl_'+str(count) +'.txt'
boy_file = open(file_name_boy,'w')
girl_file = open(file_name_girl,'w')
boy_file.writelines(boy)
girl_file.writelines(girl)
boy_file.close()
girl_file.close()
f.close()
优化写法(提取代码为函数):
def save_file(boy,girl,count):
file_name_boy = 'D:\\python\\project\\boy_'+str(count) +'.txt'
file_name_girl = 'D:\\python\\project\\girl_'+str(count) +'.txt'
boy_file = open(file_name_boy,'w')
girl_file = open(file_name_girl,'w')
boy_file.writelines(boy)
girl_file.writelines(girl)
boy_file.close()
girl_file.close()
def split_file(file_name):
f = open(file_name,'r',encoding='utf-8')
boy = []
girl =[]
count = 1
for each_line in f:
if each_line[:6] != '======':
(role,line_spoken) = each_line.split(':',1)
if role == '郭德纲':
boy.append(line_spoken)
if role == '于谦':
girl.append(line_spoken)
else:
save_file(boy,girl,count)
boy = []
girl = []
count+=1
save_file(boy,girl,count)
f.close()
split_file('talk.txt')