介绍下从通过sqoop将数据定时转移至mysql的使用方法
过程:把存储在hive表的日志数据汇总成天、时、分数据,然后把三份数据同步至mysql
前提:hive表已创建好、mysql表已创建好
将可执行sqoop语句封装成shell脚本
1、汇总脚本etl.sh
#!/bin/bash
. ~/.bash_profile
today=$(date -d '-0 day' '+%Y%m%d')
yetoday=$(date -d '-1 day' '+%Y%m%d')
tomorrow=$(date -d '1 day' '+%Y%m%d')
today_d=$(date -d '-1 hour' '+%Y-%m-%d')
tomorrow_d=$(date -d '1 day' '+%Y-%m-%d')
hive<<EOF
insert overwrite table result.t_hive_log_count_d
SELECT to_date(time) AS log_date,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time);
insert overwrite table result.t_hive_log_count_h
SELECT to_date(time) AS log_date,hour(time) as log_hour,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time);
insert overwrite table result.t_hive_log_count_m
SELECT to_date(time) AS log_date,hour(time) as log_hour,minute(time) as log_minute,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time),minute(time);
EOF
exit;
2、从hive转移至mysql脚本 hive_to_mysql.sh
注意 -update-key指定的字段必须是mysql设置的主键,否则无法实现按
主键进行增量更新
#!/bin/bash
. ~/.bash_profile
sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_d --update-key log_date --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_d --input-fields-terminated-by '\001';
sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_h --update-key log_date,log_hour --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_h --input-fields-terminated-by '\001';
sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_m --update-key log_date,log_hour,log_minute --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_m --input-fields-terminated-by '\001';
3、将1、2脚本组合log.sh
#!/bin/bash
. ~/.bash_profile
sh /hdfs/script/piwiklog/etl.sh >> /hdfs/script/piwiklog/logs/etl.log
sh /hdfs/script/piwiklog/hive_to_mysql.sh >> /hdfs/script/piwiklog/logs/hive_to_mysql.log
4、将3脚本加入定时任务即可定时执行,没小时的迪分钟执行
#统计天、时、分pv uv量 并同步至mysql
06 * * * * sh /hdfs/script/piwiklog/log.sh