
数据分析
qingyu24
这个作者很懒,什么都没留下…
展开
-
hive 建分区表
SET mapreduce.job.queuename=yjy;SET hive.cli.print.header=TRUE;set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;use dm_userimage;create table dm_userimage.f_u...原创 2018-10-12 17:32:32 · 151 阅读 · 0 评论 -
pyspark 根据某字段去重 取时间最新
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html参考官方文档from pyspark.sql import Windowcj_spouse_false = cj_spouse_false.withColumn("row_number", \ ...原创 2018-11-16 11:31:56 · 1443 阅读 · 0 评论 -
hive表同步到es
前期操作参看的hdfs://cluster/user/hive/warehouse/dm_userimage.db/f_baseinfo_online_t有几点注意,1. 插入后查询会出现类型转换异常,es和hive 类型问题,建es对应表时候 int 改为bigint date 改为timestamp float改为double 2....原创 2018-11-14 10:55:42 · 1064 阅读 · 0 评论 -
pyhive python连接hive
from pyhive import hiveconn = hive.Connection(host='1*.30',auth='LDAP',port=10000,username='*',password='*',database='*')cursor = conn.cursor()cursor.execute("select * from dp_ods.* where etl_date=...原创 2018-11-15 17:04:18 · 1142 阅读 · 0 评论 -
调用pyspark脚本
脚本内部 spark = SparkSession.builder.master('yarn-client').config('spark.executor.instances', 10).config('spark.executor.cores',2). \ config('spark.executor.memory', '4g').config('spark.sql.sh...原创 2018-11-13 15:21:01 · 1192 阅读 · 0 评论 -
hive查询分组添加行数
base_info = spark.sql("select index, type,id, score, idcardinput,banknoteorremit,idcard,\ accounttype,accountopendate,nameinput,subaccount,usablebalance,querydate as querydate_base,balance as ...原创 2018-10-22 15:54:54 · 979 阅读 · 0 评论 -
jupyter 中文乱码设置编码格式 避免控制台输出
stdi, stdo, stde = sys.stdin, sys.stdout, sys.stderrreload(sys)sys.setdefaultencoding('utf-8')sys.stdin, sys.stdout, sys.stderr = stdi, stdo, stde原创 2018-10-17 13:43:13 · 5753 阅读 · 0 评论 -
直接将hdfs 加到hive表分区 通过msck
/home/user_image/hadoop-2.7.2/bin/hadoop fs -mkdir hdfs://cluster/user/hive/warehouse/dm_userimage.db/f_userimage_messageinfo/etl_date=$yesterday/home/user_image/hadoop-2.7.2/bin/hadoop fs -cp /NS2/...原创 2018-10-19 15:06:58 · 1241 阅读 · 0 评论 -
pyspark 临时表 存hive
yf_u3.registerTempTable('yf_u3')spark.sql('DROP TABLE IF EXISTS dm_userimage.yf_u3')spark.sql('create table dm_userimage.yf_u3 select * from yf_u3')存入到指定分区INSERT OVERWRITE TABLE employeesPARTI...原创 2018-10-16 11:26:29 · 3012 阅读 · 0 评论 -
hdfs数据直接存分区
load data inpath '/user/yjy_research/zhangyd/phonelist_result/2017-01-01' overwrite into table f_userimage_phonelist partition(etl_date='2017-01-01');原创 2018-10-12 17:33:34 · 1800 阅读 · 0 评论 -
spark submit
export PYSPARK_PYTHON=*/bin/python $SPARK_HOME/bin/spark-submit --conf "spark.yarn.dist.archives=/home/anaconda2.tar.gz" --conf "spark.executorEnv.PYSPARK_PYTHON=/home/anaconda2.tar.gz/anaconda...原创 2018-12-28 17:39:37 · 189 阅读 · 0 评论