
python
ukakasu
这个作者很懒,什么都没留下…
展开
-
读取文本中的文件名,根据文件名从文件夹中把文件复制到另一个文件夹
文本中除开头两行每行都记录着另外一些文本的文件名,并已经排好序,根据这些文件名把文件从文件夹中复制到另外一个文件夹,并在文件名前根据在文本中的顺序加上序号import osf = open("molscore.txt")lines = f.readlines()mol_list = []i=1for line in lines: if i<=100 and not原创 2015-06-18 16:42:44 · 4584 阅读 · 0 评论 -
LightGBM简单使用
# pip install lightgbm==2.1.2import lightgbm as lgbimport pandas as pdfrom pandas import DataFrameimport gcfrom sklearn.model_selection import train_test_splitfrom matplotlib import pyplot #...原创 2018-09-27 14:40:02 · 2904 阅读 · 0 评论 -
missingno绘制缺失数据分布图
missingno绘制缺失数据分布图import seaborn as sns # advanced vizsimport missingno as msno # missing values# missing values?sns.set(style = "ticks")msno.matrix(data)#https://github.com/ResidentMario/m...原创 2018-09-17 11:11:54 · 2941 阅读 · 0 评论 -
python 绘制拟合曲线并加指定点标识
python 绘制拟合曲线并加指定点标识import osimport numpy as npfrom scipy import logfrom scipy.optimize import curve_fitimport matplotlib.pyplot as pltimport mathfrom sklearn.metrics import r2_score# 字体pl...原创 2018-08-08 09:19:58 · 14101 阅读 · 3 评论 -
pandas中小数作为index精度问题
pandas中用小数作为index进行join,结果发现数据条数变少,怀疑是精度问题所致。解决方法:将小数作为index之前先转换为str,再作为index。df = df.round({0: 3})df[0] = df[0].astype(str)df = df.set_index(0) ...原创 2018-05-08 16:35:28 · 1097 阅读 · 0 评论 -
keras的分类模型
基于keras的神经网络分类模型(二分类、多分类)from matplotlib import pyplotimport pandas as pdfrom sklearn.preprocessing import MinMaxScalerfrom keras.models import Sequentialfrom keras.layers.core import Dense, Dropo...原创 2018-06-05 13:17:11 · 3762 阅读 · 1 评论 -
sklearn中数据集划分
1、回归from sklearn.model_selection import train_test_splittrain_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.25)2、分类X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_s...原创 2018-04-25 10:13:21 · 1753 阅读 · 0 评论 -
机器学习中评价指标的选择
一、分类 二分类:eval_metric='auc'/'logloss' 多分类:eval_metric='mlogloss' 1、样本均衡: 准确度,二分类还可以选择auc。from sklearn.metrics import accuracy_scorefrom sklearn.metrics import classification...原创 2018-04-24 15:49:40 · 2154 阅读 · 0 评论 -
python中归一化、标准化模型保存与加载
归一化模型保存from sklearn import preprocessingmin_max_scaler = preprocessing.MinMaxScaler()X = min_max_scaler.fit_transform(X)from sklearn.externals import joblibjoblib.dump(min_max_scaler, 'scalar01'...原创 2018-04-24 15:34:59 · 10508 阅读 · 8 评论 -
keras的回归模型
基于keras的神经网络回归模型import matplotlib.pyplot as pltfrom math import sqrtfrom matplotlib import pyplotimport pandas as pdfrom numpy import concatenatefrom sklearn.preprocessing import MinMaxScalerfro...原创 2018-06-05 13:18:14 · 10769 阅读 · 4 评论 -
xgboost、cx_Oracle安装
环境最好为centos7,centos6下需升级gcc1、安装gcc 下载:https://download.youkuaiyun.com/download/ukakasu/10368679 rpm -ivh *2、升级gcc 下载:https://download.youkuaiyun.com/download/ukakasu/103686902.1、安装gmp-4.3.2...原创 2018-04-23 17:18:27 · 190 阅读 · 0 评论 -
python实现单词计数的mapreduce
map函数import sysfor line in sys.stdin: line = line.strip() words = line.split() for word in words : print "%s\t%s" % (word , 1)reduce函数import syscurrent_word=Nonecurrent_原创 2015-08-08 07:53:53 · 2421 阅读 · 0 评论 -
用python map函数实现pharmdock并行运算
最近在服务器上应用Pharmdock软件进行运算时,只能利用一个cpu,考虑到cpu浪费太多,于是考虑到了多线程import osfrom multiprocessing import Pooldef get_mol_paths(folder): return (os.path.join(folder,f) for f in os.listdir(folder) if f.end原创 2015-06-16 14:31:14 · 621 阅读 · 0 评论 -
AsyncProxyPool代理池中间件
import requestsfrom scrapy.downloadermiddlewares.retry import RetryMiddlewarefrom scrapy.utils.response import response_status_messageimport base64import logginglogger = logging.getLogger(__name...原创 2019-06-26 14:01:43 · 707 阅读 · 0 评论