- 博客(13)
- 收藏
- 关注
原创 (Python sklearn+KMeans)聚类实现鸢尾花分类
#导入所需模块import matplotlib.pyplot as pltimport numpy as npfrom sklearn.cluster import KMeansfrom sklearn.datasets import load_iris#导入鸢尾花数据集iris = load_iris()X = iris.data[:]# print(X)print(X.shape)#肘方法看k值d=[]for i in range(1,11): #k取值1~11,做km.
2021-10-22 23:05:50
4606
原创 (Python gensim+Word2Vec)实现文本相似度计算
# -*-encoding=utf-8-*-import jiebafrom gensim.models.word2vec import Word2Vec# jieba分词返回列表def jieba_cut(sent): sent1 = jieba.lcut(sent) return sent1# gensim-Word2Vec模型训练def word2vec1(sent1,sent2): sent1 = jieba_cut(sent1) sent2 = jie.
2021-10-22 22:54:05
2814
原创 (Python jieba+bow)实现文本相似度比较
# -*- encoding=utf-8 -*-import jieba.possegimport jieba.analyseimport mathimport re# jieba实现中文分词def jieba_function(input1): input1 = re.sub(r'\W*', '',input1) # jieba.load_userdict("dic.txt") jieba.analyse.set_stop_words("3.txt") # 词.
2021-10-12 23:46:28
750
原创 (Python re+collections)实现贝叶斯单词检查器
# -*-encoding:utf-8-*-import re,collections# 把语料库中的单词全部抽取出来,转成小写,并去除单词中间的特殊符号def words(text): return re.findall('[a-z]+',text.lower())def train(features): model = collections.defaultdict(lambda:1) for f in features: model[f] += 1 .
2021-10-11 23:37:39
202
原创 (Python sklearn+KNeighborsClassifier)实现鸢尾花分类
# -*-encoding:utf-8-*-import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.metrics import confusion_matrixfrom sklearn.metrics import classification_report# 读取数据iris_.
2021-10-10 18:28:38
431
原创 (Python sklearn+LogisticRegression)实现乳腺癌预测
# -*-encoding='utf-8'-*-#导入pandas与numpy工具包import numpy as npimport pandas as pd#创建特征列表column_names = ['Sample code number','Clump Thickness','Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare .
2021-10-10 18:25:13
1440
原创 (Python tf-idf textrank)实现文章关键词提取
tf-idf(该文章该词词频/该文章总词数*(Log(文章总篇数/出现该词的文章数+1))偏词频提取# -*- coding:utf-8 -*-import jieba.analysestr_1 = "中央财政187.6亿保护草原生态,7月8日记者从财政部" \ "农业司获悉:2018年,中央财政安排新一轮草原生态保护" \ "补助奖励187.6亿元,支持实施禁牧面积12.06亿亩,草畜" \ "平衡面积26.05亿亩,并对工作突出、成效显著地区给予奖励"
2021-10-04 00:07:45
313
原创 (Anaconda创建虚拟环境添加tensorflow,keras)
操作系统:win10 64位python版本:3.8.81. 下载Anaconda2. 安装 (环境变量勾上)(系统默认显示不勾) 镜像源(我系统默认路径:C:\Users\86156\.condarc)channels: - https://mirrors.ustc.edu.cn/anaconda/pkgs/free/ - https://mirrors.ustc.edu.cn/anaconda/pkgs/main/ - https://mirrors....
2021-09-24 11:59:55
249
原创 (Python-jieba.posseg.cut)中文词性标注算法-我爱北京天安门
1.txt:我爱北京天安门 词性标注结果写入2.txt# -*- encoding:utf-8 -*-import jieba.posseg# 读取文档with open("1.txt",'r',encoding='utf-8')as f: words_2=jieba.posseg.cut(f.read()) # 进行词性标注# 标注完写入文档with open("2.txt",'w',encoding='utf-8')as f: for i in words_2: ..
2021-09-16 23:11:09
1196
原创 (Python实现中文分词最大匹配算法)研究生命的起源
正向进行中文分词匹配:# -*- coding: utf-8 -*-# 待分词语句str_1='研究生命的起源'# 最大长度M=3# 词典列表list_1=['研究','研究生','生命','命','的','起源']# 字符串载体list_2=['研','究','生','命','的','起','源']# 找到定位点list_3=[]for i in range(len(str_1)//M+1): list_3.append(0+i*M)# 找需匹配的切片进行匹配for
2021-09-13 21:15:19
569
原创 (Python requests+正则 re) 提取猫眼Top100存入CSV文件
# -*- coding: utf-8 -*-import requestsimport reimport timeimport csvimport random#获取链接def get_one_page(url): try: agent_1='Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' agent_2='Mozilla/5.0 (Win.
2021-09-11 16:36:16
199
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人