
统计学习
机器学习 深度学习
咖啡豆丁
这个作者很懒,什么都没留下…
展开
-
shap值的使用
import shapshap.initjs()explainer = shap.TreeExplainer(model)shap_values = explainer.shap_values(X_train)shap.summary_plot(shap_values, X_train,max_display=80)原创 2022-03-11 15:19:37 · 646 阅读 · 0 评论 -
Misusing resampling, leading to a data leakage
In the resampling setting, there is a common pitfall that corresponds to resample the entire dataset before splitting it into a train and a test partitions. Note that it would be equivalent to resample the train and test partitions as well.原创 2021-10-20 14:37:48 · 118 阅读 · 0 评论 -
贝叶斯优化包安装
pip install scikit-optimizepip install hyperoptpip install -i https://pypi.douban.com/simple bayesian-optimization原创 2021-10-15 15:15:58 · 435 阅读 · 0 评论 -
pandas轴
原创 2021-03-11 20:19:33 · 155 阅读 · 0 评论 -
Tensorflow图像识别小结
一、run.py执行过程:1、执行tf.app.run()2、tf.app.run()会调用本类中的的main函数,然后依次执行3、首先获取已经处理好的图片4、然后定义tensorflow占位符,convolution层和pool层,全连接层和逻辑输出层5、最后定义损失函数,反向传播过程和模型评价的一些指标6、获取session,并初始化所有变量,sess.run()启动模型训练的过程7、模型训练结束后,读取模型存放的参数和图,并预测新的数据8、sess.close()整..原创 2021-03-11 20:16:06 · 555 阅读 · 0 评论 -
Getting reproducible results across multiple executions
from sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_splitimport numpy as nprng = np.random.RandomState(0)X, y = make_classification(random_state=rng)rf = .原创 2021-01-20 10:41:39 · 79 阅读 · 0 评论 -
hive 基本统计信息
selectmax(col),percentile(col,0.75),percentile(col,0.5),percentile(col,0.25),min(col),avg(col),stddev_samp(col) from database.hello;原创 2020-12-29 16:56:52 · 220 阅读 · 0 评论 -
python 创建和删除临时目录
from tempfile import mkdtempfrom shutil import rmtreecachedir = mkdtemp()rmtree(cachedir)原创 2020-12-28 13:17:54 · 550 阅读 · 0 评论 -
数据挖掘环境搭建
1.安装anaconda2.创建虚拟环境conda create --name datamining python=3.63.安装jupyterlabconda install -c conda-forge jupyterlab4.创建新的核心(jupyter lab自带核心ipykernel,不用安装)python -m ipykernel install --user --name=datamining --display-name datamining5.安装插件con原创 2020-12-11 19:49:55 · 677 阅读 · 1 评论 -
anaconda 清华镜像
channels: - defaultsshow_channel_urls: truechannel_alias: https://mirrors.tuna.tsinghua.edu.cn/anacondadefault_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free - https:...原创 2020-12-11 19:12:29 · 732 阅读 · 0 评论 -
sklearn+numpy+pandas+matplotlib+seaborn兼容版本
python 3.6sklearn 0.23.2numpy 1.19.1pandas 1.1.3matplotlib 2.2.3 注意:matplotlib出现过兼容问题,不能使用3.3.3版本seaborn 0.9.0原创 2020-12-10 16:57:40 · 10456 阅读 · 2 评论