sparkstream-kafka



最近想弄一个sparkstream的实时监控,spark(1.2.0-cdh5.3.0)对接kafka。使用wordcount的例子如下:




1.按照kafka官网quick start教程将kafka跑起来:




http://kafka.apache.org/




2.用eclipse 随意构建一个maven工程,在pom.xml中添加如下:


         <dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.2.0</version>
</dependency>
</dependencies>




3.根据pom更新依赖,将刚下载的所有的maven-dependency拿出来放到工作目录下的新建的lib文件夹中,这些jar包含,注意其中username 是自己的用户名:

/home/username/.m2/repository/org/apache/spark/spark-streaming-kafka_2.10/1.2.0/spark-streaming-kafka_2.10-1.2.0.jar
/home/username/.m2/repository/org/apache/kafka/kafka_2.10/0.8.0/kafka_2.10-0.8.0.jar
/home/username/.m2/repository/org/scala-lang/scala-library/2.10.1/scala-library-2.10.1.jar
/home/username/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar
/home/username/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar
/home/username/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar
/home/username/.m2/repository/org/scala-lang/scala-compiler/2.10.1/scala-compiler-2.10.1.jar
/home/username/.m2/repository/org/scala-lang/scala-reflect/2.10.1/scala-reflect-2.10.1.jar
/home/username/.m2/repository/com/101tec/zkclient/0.3/zkclient-0.3.jar
/home/username/.m2/repository/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar
/home/username/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar
/home/username/.m2/repository/org/slf4j/slf4j-api/1.7.2/slf4j-api-1.7.2.jar
/home/username/.m2/repository/com/yammer/metrics/metrics-annotation/2.2.0/metrics-annotation-2.2.0.jar
/home/username/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar




3-1 .将spark-assmbly-1.2.0-cdh5.3.0-hadoop2.5.0....jar也放在lib文件夹里面




4.在eclipse中新建scala 工程,引用lib里面所有的jar文件




5.创建scala文件: 

其中的参数自己看看

             

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.StreamingContext._

object CollectBykafka {
  
  def main(args: Array[String]) {
   
    println("Start to run SparkStreamingKakfaWordCount")  
    val conf = new SparkConf().setAppName("SparkStreamingKakfaWordCount")  
    val ssc = new StreamingContext(conf, Seconds(10))  
    val topicMap = "test".split(":").map((_, 1)).toMap   
    
    val zkQuorum = "localhost:2181";  
    //consumer group  
    val group = "test-consumer-group"  
    val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
    
    lines.print()
    
    val words = lines.flatMap(_.split(" "))
    val maped = words.map { x => (x, 1) }
    val wordCounts = maped.reduceByKey(_+_)
    
    wordCounts.print()
    
    ssc.start()  
    ssc.awaitTermination()  
  }
}

6.将工程打包成jar,放到工作目录,跟lib必须在同一个目录下




7.spark-submit --class CollectBykafka --driver-class-path $(echo ./lib/*.jar |sed 's/ /:/g') dataCollect.jar




注意点:lib里面的文件必须通过--driver-class-path 参数传给spark
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值