1、简介
上一篇blog提到用Flink消费Kafka的数据,kafka的produce是用kafka console produce的。今天主要测试下kafka的produce程序(scala编写),模拟在应用服务器上日志的收集,看看Flink能否正常消费数据。
这里的producer实现了每隔3秒去查找特定目录下的文件,将文件的内容批量produce到kafka(async),然后将文件重命名并移动到另外的目录。
2、代码
2.1、添加maven依赖
<!--kafka相关的依赖-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.1</version>
</dependency>
<!--scala相关的依赖-->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.4</version>
</dependency>
<!--reflect IO的依赖-->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.10.4</version>
</dependency>
2.2、class KafkaProduceMsg
继承Runnable接口,run()方法实现kafka的produce数据,且实现每隔3秒去查找一次:
def run() : Unit = {
while(true){
val files = Path(this.DIR).walkFilter(p => p.isFile && p.name.contains("transaction"))
try{
for(file <- files){
val reader = Source.fromFile(file.toString(), "UTF-8")
for(line <- reader.getLines()){
val message = new KeyedMessage[String, String](this.TARGET_TOPIC, line)
producer.send(message)
}
//produce完成后,将文件copy到另一个目录,之后delete
val fileName = file.toFile.name
file.toFile.copyTo(Path("/root/Documents/completed/" +fileName + ".completed"))
file.delete()
}
}catch{
case e : Exception => println(e)
}
try{
//sleep for 3 seconds after send a micro batch of message
Thread.sleep(3000)
}catch{
case e : Exception => println(e)
}
}
class完整的代码如下:
package kafka
import java.util.Properties
import kafka.producer.{KeyedMessage, Producer, ProducerConfig}
import scala.io.Source
import scala.reflect.io.Path
/**
* Kafka Producer
*/
class KafkaProduceMsg(brokerList : String, topic : String) extends Runnable{
private val BROKER_LIST = brokerList //"master:9092,worker1:9092,worker2:9092"
private val TARGET_TOPIC = topic //"new"
private val DIR = "/root/Documents/"
/**
* 1、配置属性
* metadata.broker.list : kafka集群的broker,只需指定2个即可
* serializer.class : 如何序列化发送消息
* request.required.acks : 1代表需要broker接收到消息后acknowledgment,默认是0
* producer.type : 默认就是同步sync
*/
private val props = new Properties()
props.put("metadata.broker.list", this.BROKER_LIST)
props.put("serializer.class", "kafka.serializer.StringEncoder")
props.put("request.required.acks", "1")
props.put("producer.type", "async")
/**
* 2、创建Producer
*/
private val config = new ProducerConfig(this.props)
private val producer = new Producer[String, String](this.config)
/**
* 3、产生并发送消息
* 搜索目录dir下的所有包含“transaction”的文件并将每行的记录以消息的形式发送到kafka
*
*/
def run() : Unit = {
while(true){
val files = Path(this.DIR).walkFilter(p => p.isFile && p.name.contains("transaction"))
try{
for(file <- files){
val reader = Source.fromFile(file.toString(), "UTF-8")
for(line <- reader.getLines()){
val message = new KeyedMessage[String, String](this.TARGET_TOPIC, line)
producer.send(message)
}
//produce完成后,将文件copy到另一个目录,之后delete
val fileName = file.toFile.name
file.toFile.copyTo(Path("/root/Documents/completed/" +fileName + ".completed"))
file.delete()
}
}catch{
case e : Exception => println(e)
}
try{
//sleep for 3 seconds after send a micro batch of message
Thread.sleep(3000)
}catch{
case e : Exception => println(e)
}
}
}
}
2.3、main方法
调用kafka的produce程序
package kafka
object ProduceMsg {
def main(args : Array[String]): Unit ={
if(args.length < 2){
println("Usage : ProduceMsg master:9092,worker1:9092 new")
System.exit(1)
}
new Thread(new KafkaProduceMsg(args(0),args(1))).start()
}
}
3、测试
打包发布:
root@master:~# java -cp /root/Documents/kafka-1.0-SNAPSHOT.jar kafka.ProduceMsg master:9092,worker1:9092 new
在目录/root/Documents/下添加一个transanction.csv文件(6400条记录),Flink很快消费了这部分数据,且transaction.csv文件在produce后被转移到了目录/root/Documents/completed/目录下。
root@worker2:/usr/local/flink/flink-1.0.3/log# ls -l | grep out
-rw-r--r-- 1 root root 2081645 6月 30 16:13 flink-root-taskmanager-0-worker2.out
oot@master:~/Documents/completed# ls -l
total 2028
-rw-r--r-- 1 root root 2072869 6月 30 16:13 transaction.csv.completed
root@master:~/Documents/completed#
文件大小不一样,主要是因为flink在print的时候,每条print增加了slot的标志,但record的数量都是6400条。
http://kafka.apache.org/082/documentation.html#theproducer
http://www.36dsj.com/archives/44818
http://www.jasongj.com/tags/Kafka/