咖啡豆丁-优快云博客

原创 python单例模式

def __new__(cls, *args, **kwargs): '''单例模式''' if not hasattr(cls, 'instance'): cls.instance = super(CreateFuZhuJianChaRes, cls).__new__(cls) return cls.instance

2022-03-28 10:17:51 438

1.添加中文支持import seaborn as snssns.set_style({'font.sans-serif':['simhei', 'Arial']})2.设置字体大小sns.set(font_scale=1)3.正常显示横纵坐标的负值plt.rcParams['axes.unicode_minus']=False4.科学计数法def formatnum(x, pos): return '$%.1f$x$10^{4}$' % (x/10000)from ma...

2022-03-11 15:23:08 479

原创 pandas常用操作

1.转换数据类型df.apply(pd.to_numeric,errors='ignore')2.宽数据转换为长数据pd.melt(df,id_vars=['col3'])3.重命名列名称df.rename(columns={'col1':'列1'},inplace=True)4.crosstab 混淆矩阵pd.crosstab(df['truth'],df['predict'])5.join操作merge_df=pd.merge(df1,df2,on='col1',how='left')

2022-03-11 15:21:28 351

原创 shap值的使用

import shapshap.initjs()explainer = shap.TreeExplainer(model)shap_values = explainer.shap_values(X_train)shap.summary_plot(shap_values, X_train,max_display=80)

2022-03-11 15:19:37 634

原创 python时间操作

print('时间戳转日期标准格式:')from datetime import datetimet=1640329180format_time = str(datetime.fromtimestamp(t))print(format_time)print('日期标准格式转时间戳:')cday = datetime.strptime('2015-6-1 18:19:59', '%Y-%m-%d %H:%M:%S')timestamp = cday.timestamp()print(tim.

2021-12-24 15:16:09 572 1

原创 Misusing resampling, leading to a data leakage

In the resampling setting, there is a common pitfall that corresponds to resample the entire dataset before splitting it into a train and a test partitions. Note that it would be equivalent to resample the train and test partitions as well.

2021-10-20 14:37:48 113

原创贝叶斯优化包安装

pip install scikit-optimizepip install hyperoptpip install -i https://pypi.douban.com/simple bayesian-optimization

2021-10-15 15:15:58 429

原创 pyspark 第三方依赖包

spark-submit \--master yarn \--deploy-mode cluster \--driver-memory 1g \--num-executors 1 \--queue default \--conf spark.yarn.dist.archives=hdfs:///user/xxx/conda_env.zip#python36 \--conf spark.pyspark.driver.python=./python36/conda_env/bin/python \

2021-09-13 10:47:52 785

原创 pip切换安装来源

pip install xgboost==0.71 -i https://pypi.tuna.tsinghua.edu.cn/simple

2021-09-06 14:34:54 119

原创 shell中执行mysql语句

mysql -h IP --port=端口 --database=数据库 -u用户 -p密码 -e "select * from hello; " > hello.txt

2021-08-26 10:19:32 414

原创 java 执行scala/java jar包

java 执行命令：java -Djava.ext.dirs=<多个jar包的目录> com.hello 参数1 参数2jar包中需要包含 scala-library-2.11.8.jar等

2021-08-24 13:40:28 337

原创行政区划简称（包括别称）

"京"->"北京市""津"->"天津市""沪"->"上海市""渝"->"重庆市""蒙"->"内蒙古自治区""新"->"新疆维吾尔自治区""藏"->"西藏自治区""宁"->"宁夏回族自治区""桂"->"广西壮族自治区""港"->"香港特别行政区""澳"->"澳门特别行政区""黑"->"黑龙江省""吉"->"吉林省""辽"->"辽宁省""晋"->"山西省""冀"->"河北省""青"-&

2021-08-19 17:02:21 1390

原创 yarn日志查看

yarn logs -applicationId application_id > log 2>&1

2021-08-18 13:59:55 243

原创 java 二维数组排序

Arrays.sort(envelopes, new Comparator<int[]>() { public int compare(int[] o1, int[] o2) { if(o1[0] == o2[0]){ // 若俩数组的第一个元素相等，则比较它们的第二个元素 return o1[1] - o2[1]; }else { // 若俩数组的第一个元素不相等，则按从小到大的顺序...

2021-08-17 20:20:06 156

原创 linux 常见几个日期操作

#获取几天前的日期date -d "20210812 -3 days" +"%Y%m%d"#获取几个月前对应的日期date -d "20210812 -3 month" +"%Y%m%d"#获取几个月前对应的月份first=`date -d "20210803" +"%Y%m"`month=`date -d "${first}01 -3 month" +"%Y%m"`

2021-08-12 10:56:21 160

原创 linux当前路径

script_dir=$(cd $(dirname ${BASH_SOURCE[0]}); pwd)

2021-08-10 15:13:13 130

原创 python spark 提交模板

spark-submit \--master yarn \--deploy-mode mode\--driver-memory 2g \--num-executors 30 \--executor-memory 6G \--executor-cores 4 \--conf spark.shuffle.service.enabled=true \--conf spark.dynamicAllocation.enabled=true \--conf spark.dynamicAllocatio

2021-08-10 10:32:46 144

原创 pyspark启动

pyspark --master yarn \--deploy-mode client \--conf spark.default.parallelism=240 \--queue queue\--driver-memory 2G \--executor-memory 6G \--executor-cores 4 \--num-executors 30

2021-05-26 15:29:03 882

原创 pytorch docker GPU环境安装

一.确保nvidia-docker安装查看系统版本root@3a7dee2ecfb3:/# cat /etc/issueUbuntu 16.04.6 LTS \n \l切换docker镜像systemctl daemon-reloadsystemctl restart docker查看cuda和cudnn版本[root@bogon pytorch]# cat /usr/local/cuda/version.txtCUDA Version 9.1.85[root@bogon pytor.

2021-03-11 20:41:30 364

原创 Java变量大小

2021-03-11 20:29:34 432

原创 pandas轴

2021-03-11 20:19:33 151

原创 Tensorflow读取数据流

2021-03-11 20:18:45 221

原创 Tensorflow图像识别小结

一、run.py执行过程:1、执行tf.app.run()2、tf.app.run()会调用本类中的的main函数,然后依次执行3、首先获取已经处理好的图片4、然后定义tensorflow占位符，convolution层和pool层，全连接层和逻辑输出层5、最后定义损失函数，反向传播过程和模型评价的一些指标6、获取session，并初始化所有变量，sess.run()启动模型训练的过程7、模型训练结束后，读取模型存放的参数和图，并预测新的数据8、sess.close()整..

2021-03-11 20:16:06 549

原创 AI平台虚拟环境搭建流程

环境需求:jupyterhub需要3.5环境以上tensorflow 3.5环境以上一、使用tensorflow/tensorflow:1.7.0-gpu-py3 docker镜像启动容器,目的:显卡驱动的安装容器名称为 ai_platform端口映射为 8008:8000 jupyterhub端口端口映射为 6008:6006 tensorboard端口使用 nvidia-docker软件共享目录映射 /ai_share/pr...

2021-03-11 20:07:41 847

原创深度学习在图像领域实践的流程及思考

1.需求的提出业务需求的提出，需要明确需求的价值，需求的背景和应用的场景以及需求方对于最后结果的大概期望。2.数据集的构建数据集的来源数据分为两种情况：其一、是预定义收集的数据，是指在产品的设计初期，即考虑到下游对数据的使用，所以对数据的收集的格式进行了一定的规范，比如图像的大小，识别目标的位置，识别目标的图像的占比等。其二、是未预定义收集的数据，即是产品设计初期，未考虑到数据的相关问题，造成收集的数据质量无法保证。大部分要构建数据集的源数据一般都是...

2021-03-10 19:16:25 362

原创 Getting reproducible results across multiple executions

from sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_splitimport numpy as nprng = np.random.RandomState(0)X, y = make_classification(random_state=rng)rf = .

2021-01-20 10:41:39 79

原创 hive 基本统计信息

selectmax(col),percentile(col,0.75),percentile(col,0.5),percentile(col,0.25),min(col),avg(col),stddev_samp(col) from database.hello;

2020-12-29 16:56:52 217

原创 python 创建和删除临时目录

from tempfile import mkdtempfrom shutil import rmtreecachedir = mkdtemp()rmtree(cachedir)

2020-12-28 13:17:54 544

原创 scala实现continue

import util.control.Breaks._object BreakableTest extends App{ val sample = List(1,2,3,4,5) for(j <- sample){ breakable{ if(j==2 || j == 4 ) { break() } println(s"打印的值为:$j") } }}结果为:打印的值为:1打印的值为:3打印的值为:5.

2020-12-24 13:39:09 297

原创 hive表随机采样

select * from test.hello distribute by rand() sort by rand() limit 200;

2020-12-18 14:38:01 220

原创 hive 关闭map join 优化

大小表关联，默认map join,申请本地内存巨大，导致报错退出Starting to launch local task to process map join; maximum memory = 477626368Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask关闭map join 即可set hive.auto.convert.join=false;...

2020-12-18 13:57:30 2330

原创 matplotlib科学计数法

def formatnum(x, pos): return '$%.1f$x$10^{4}$' % (x/10000)from matplotlib.ticker import FuncFormatterformatter = FuncFormatter(formatnum)ax1.yaxis.set_major_formatter(formatter)

2020-12-17 19:43:33 1171

原创 hive时间戳转日期

--毫秒转日期select from_unixtime(cast(1607966692000/1000 as bigint),'yyyy/MM/dd HH:mm:ss');

2020-12-17 14:50:48 1654

原创数据挖掘环境搭建

1.安装anaconda2.创建虚拟环境conda create --name datamining python=3.63.安装jupyterlabconda install -c conda-forge jupyterlab4.创建新的核心(jupyter lab自带核心ipykernel，不用安装)python -m ipykernel install --user --name=datamining --display-name datamining5.安装插件con

2020-12-11 19:49:55 673 1

原创大数据任务添加名称

sqoop--mapreduce-job-name 'hello'hiveset mapred.job.name=hello;MR-D mapreduce.job.name=hello;

2020-12-11 19:24:47 374

原创 Linux文本对比常用命令

diff b.txt a.txt -y -W 50comm a.txt b.txt -1 -3

2020-12-11 19:16:25 212

原创 idea常用快捷键

一、Ctrl 快捷键Ctrl + F 在当前文件进行文本查找Ctrl + R 在当前文件进行文本替换Ctrl + Z 撤销Ctrl + Y 删除光标所在行或删除选中的行Ctrl + X 剪切光标所在行或剪切选择内容Ctrl + C 复制光标所在行或复制选择内容Ctrl + D 复制光标所在行或复制选择内容，并把复制内容插入光标位置下面Ctrl + E 显示最近打开的文件记录列表Ctrl + N 根据输入的名/类名查找类文件Ctrl + / 释光标所在行代码，会根据当前

2020-12-11 19:14:58 83

原创 anaconda 清华镜像

channels: - defaultsshow_channel_urls: truechannel_alias: https://mirrors.tuna.tsinghua.edu.cn/anacondadefault_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free - https:...

2020-12-11 19:12:29 720

原创 sklearn+numpy+pandas+matplotlib+seaborn兼容版本

python 3.6sklearn 0.23.2numpy 1.19.1pandas 1.1.3matplotlib 2.2.3 注意:matplotlib出现过兼容问题，不能使用3.3.3版本seaborn 0.9.0

2020-12-10 16:57:40 10354 2

原创 spark 聚合函数类型转换

val zeroValue = collection.mutable.Set[Row]()val aggStopRDD = stopPairRDD.aggregateByKey(zeroValue)((set,v) => set += v, (setOne,setTwo)=> setOne ++= setTwo)

2020-09-25 15:51:19 227

flink框架入门介绍和示例程序

FasterRCNN算法.pptx

空空如也