sqoop运用

本文介绍了一种通过Shell脚本结合Sqoop工具将Hive中的日志数据按天、小时、分钟汇总,并同步到MySQL数据库的方法。该方案首先利用Hive SQL进行数据聚合处理,再通过Sqoop导出功能实现数据的定时迁移。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

介绍下从通过sqoop将数据定时转移至mysql的使用方法

过程:把存储在hive表的日志数据汇总成天、时、分数据,然后把三份数据同步至mysql

前提:hive表已创建好、mysql表已创建好

将可执行sqoop语句封装成shell脚本

1、汇总脚本etl.sh
#!/bin/bash
. ~/.bash_profile
today=$(date -d '-0 day' '+%Y%m%d')
yetoday=$(date -d '-1 day' '+%Y%m%d')
tomorrow=$(date -d '1 day' '+%Y%m%d')

today_d=$(date -d '-1 hour' '+%Y-%m-%d')
tomorrow_d=$(date -d '1 day' '+%Y-%m-%d')

hive<<EOF   
insert overwrite table result.t_hive_log_count_d
SELECT to_date(time) AS log_date,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time);

insert overwrite table result.t_hive_log_count_h
SELECT to_date(time) AS log_date,hour(time) as log_hour,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time);

insert overwrite table result.t_hive_log_count_m
SELECT to_date(time) AS log_date,hour(time) as log_hour,minute(time) as log_minute,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time),minute(time);
EOF
exit;

2、从hive转移至mysql脚本 hive_to_mysql.sh
   注意  -update-key指定的字段必须是mysql设置的主键,否则无法实现按
         主键进行增量更新
#!/bin/bash
. ~/.bash_profile
sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_d  --update-key log_date --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_d  --input-fields-terminated-by '\001';

sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_h  --update-key log_date,log_hour --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_h  --input-fields-terminated-by '\001';

sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_m  --update-key log_date,log_hour,log_minute --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_m  --input-fields-terminated-by '\001';
3、将12脚本组合log.sh
#!/bin/bash
. ~/.bash_profile
sh /hdfs/script/piwiklog/etl.sh >> /hdfs/script/piwiklog/logs/etl.log
sh /hdfs/script/piwiklog/hive_to_mysql.sh >> /hdfs/script/piwiklog/logs/hive_to_mysql.log

4、将3脚本加入定时任务即可定时执行,没小时的迪分钟执行
#统计天、时、分pv uv量   并同步至mysql
06 * * * * sh /hdfs/script/piwiklog/log.sh
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值