- 博客(11)
- 收藏
- 关注
原创 【转】将文中标点替换成空格(收藏留用)
import reimport oslist=[',','?','.','?','!','*','(',')','“','”',':','"','`','\''] ##要替换的标点符号做成一个列表with open(r"out1无空行.txt",'r',encoding="utf-8") as f: ##text.txt是用来训练的文本 result = f.read() for i in range(len(lis.
2021-07-15 14:10:16
1355
原创 中文文本实现分词+去停用词(PYTHON)
import jieba# 创建停用词列表def stopwordslist(): stopwords = [line.strip() for line in open(r'stopwords.txt',encoding='UTF-8').readlines()] return stopwords#扩展jieba分词词库dict='fencibuchong.txt'jieba.load_userdict(dict)# 对句子进行中文分词def seg_depart(s..
2021-07-12 10:31:33
6917
7
原创 Python中read/readline/readlines的区别?
https://blog.youkuaiyun.com/bycare/article/details/80030469?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522162521454216780265497829%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=162521454216780265497829&biz_id=0&.
2021-07-02 16:34:21
151
原创 关于报错:‘DataFrame‘ object has no attribute ‘str‘
data_len = data_null_comments[data_null_comments.str.len()>4]print(data_len)运行这段代码时报错'DataFrame' object has no attribute 'str'原因是我读入的数据是DataFrame格式,只需将数据添加一个列名names=['txt']即可,然后将上述代码改为:data_len = data_null_comments[data_null_comments['txt'].st..
2021-06-30 20:23:19
15814
1
原创 DataFrame基本知识点
1.数据类型——二维数组(索引+属性)2.创建dataframe(1)二维数组创建import pandas as pddf=pd.DataFrame([[1,2,3],[4,5,6]],index=['a','b'],columns=['f','h','g'])print(df)[out] f h ga 1 2 3b 4 5 6index——指定行索引的名称columns——指定列索引的名称(2)通过字典创建dm=pd.DataFrame..
2021-06-30 15:23:41
273
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人
RSS订阅