azkaban案例使用
编写多job依赖:
job1:上传倒排索引的文件到hdfs
job2依赖job1:进行倒排索引的第一次处理
job3依赖job2:计算最终的倒排索引
关于倒排索引参考https://blog.youkuaiyun.com/qq_40249304/article/details/93322984
# job1 上传文件
type=command
command=/opt/module/hadoop-3.1.2/bin/hadoop fs -mkdir -p /index/data
command.1=/opt/module/hadoop-3.1.2/bin/hadoop fs -put /opt/test/index/data/a.txt /index/data/
command.2=/opt/module/hadoop-3.1.2/bin/hadoop fs -put /opt/test/index/data/b.txt /index/data/
command.3=/opt/module/hadoop-3.1.2/bin/hadoop fs -put /opt/test/index/data/c.txt /index/data/
# job2 倒排索引第一步
type=command
command=/opt/module/hadoop-3.1.2/bin/hadoop jar /opt/test/index/sparkStudy-1.0-SNAPSHOT.jar day4_jobs_input.jobs.index.IndexDrive1 /index/data/* /index/indexOut1
dependencies=job1
# 倒排索引第二步
type=command
command=/opt/module/hadoop-3.1.2/bin/hadoop jar /opt/test/index/sparkStudy-1.0-SNAPSHOT.jar day4_jobs_input.jobs.index.IndexDrive2 /index/indexOut1 /index/indexOut2
dependencies=job2
结果:
[外链图片转存失败(img-5Xzjmybk-1563537412821)(E:\桌面\大数据\作业\作业Azkaban\azkaban3结果.png)]