每日定时导入hive数据仓库的自动化脚本
创建shell脚本,创建临时表,装载数据,转换到正式的分区表中:
- #!/bin/sh
- # upload logs to hdfs
- yesterday=`date --date='1 days ago' +%Y%m%d`
- hive -e "
- use stage;
- create table tracklog_tmp (
- dateday string,
- datetime string,
- ip string ,
- cookieid string,
- userid string,
- logserverip string,
- referer string,
- requesturl string,
- remark1 string,
- remark2 string,
- alexaflag string,
- ua string,
- wirelessflag string
- )
- ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';"
- hive -e "
- use stage;
- set hive.enforce.bucketing=true;
- set hive.exec.compress.output=true;
- set mapred.output.compress=true;
- set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
- set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
- load data local inpath '/diskg/hexunlogs/tracklog_10.0.251.146/${yesterday}/${yesterday}??.dat' overwrite into table tracklog_tmp;
- insert into table tracklog PARTITION (day='${yesterday}') select * from tracklog_tmp;
- load data local inpath '/diskg/hexunlogs/tracklog_10.0.121.74/${yesterday}/${yesterday}??.dat' overwrite into table tracklog_tmp;
- insert into table tracklog PARTITION (day='${yesterday}') select * from tracklog_tmp;
- load data local inpath '/diskg/hexunlogs/tracklog_10.0.190.13/${yesterday}/${yesterday}??.dat' overwrite into table tracklog_tmp;
- insert into table tracklog PARTITION (day='${yesterday}') select * from tracklog_tmp;
- load data local inpath '/diskg/hexunlogs/trackloguc_10.0.251.146/${yesterday}/${yesterday}??.dat' overwrite into table tracklog_tmp;
- insert into table tracklog PARTITION (day='${yesterday}') select * from tracklog_tmp;
- load data local inpath '/diskg/hexunlogs/trackloguc_10.0.121.74/${yesterday}/${yesterday}??.dat' overwrite into table tracklog_tmp;
- insert into table tracklog PARTITION (day='${yesterday}') select * from tracklog_tmp;
- load data local inpath '/diskg/hexunlogs/trackloguc_10.0.190.13/${yesterday}/${yesterday}??.dat' overwrite into table tracklog_tmp;
- insert into table tracklog PARTITION (day='${yesterday}') select * from tracklog_tmp;
- "
- hive -e "
- use stage;
- drop table tracklog_tmp ;"
- hive -e "
- set hive.enforce.bucketing=true;
- set hive.exec.compress.output=true;
- set mapred.output.compress=true;
- set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
- set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
- insert into table ods.tracklog PARTITION (day='${yesterday}') select dateday, datetime,ip,cookieid,userid, logserverip,referer,
- requesturl ,remark1,remark2,alexaflag,ua,wirelessflag from stage.tracklog where day='${yesterday}' and length(datetime)=12 ;"
在crontab中加入定时任务
crontab -e #会跳出一个编辑文本框,
加入如下代码
#import tracklog
25 07 * * * /opt/bin/hive_opt/import_tracklog.sh
刷新定时任务的配置
/sbin/service crond reload
https://blog.youkuaiyun.com/kwu_ganymede/article/details/49095573
--------------crontab-------------
{
基本格式 :
* * * * * command
分 时 日 月 周 命令
}
45 4 1,10,22 * * /usr/local/etc/rc.d/lighttpd restart上面的例子表示每月1、10、22日的4 : 45重启apache。
0 */1 * * * /usr/local/etc/rc.d/lighttpd restart
每一小时重启apache
0 23-7/1 * * * /usr/local/etc/rc.d/lighttpd restart
晚上11点到早上7点之间,每隔一小时重启apache
参考https://blog.youkuaiyun.com/bsf5521/article/details/76522222
----------java quartz--------------
java quartz的时间格式 多了秒级别的
{
基本格式 :
* * * * * * command
秒 分 时 日 月 周 命令
}