目录结构 /home/training/training_materials/dev1/exercises/spark-application/countjpgs/src/main/scala/stubs/CountJPGs.scala
编辑这个文件,代码如下:
package stubs
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object CountJPGs {
def main(ages: Array[String]) {
if (args.length < 1) {
System.err.println("Usage: CountJPGs <file>")
System.exit(1)
}
val sc = new SparkContext()
val logfile = args(0)
val weblogs = sc.textFile(logfile)
val weblogsJpg = weblogs.map(_.split(' ')(6)).filter(_.contains(".jpg"))
val weblogJpgCount = weblogsJpg.count()
println("JPG Count : "+weblogsJpgCount)
sc.stop
System.exit(1)
}
}
进入 /home/training/training_materials/dev1/exercises/spark-application/countjpgs 文件夹下,对该项目进行编译,编译完成后,jar包会出现在target文件夹下:
$ cd /home/training/training_materials/dev1/exercises/spark-application/countjpgs
$ mvn package
运行该程序
$ spark-submit --class stubs.CountJPGs target/countjpgs-1.0.jar /loudacre/weblogs/*66
就可以看到结果
补充:将程序提交到YARN集群上面运行
$ spark-submit --class stubs.CountJPGs --master yarn-client target/countjpgs-1.0.jar /loudacre/weblogs/*66