关于log4j,flume,kafka,sparkstreaming整套日志处理流程的梳理_log4j->flume->kafka->spark streaming->mysql-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_19533461/article/details/88212761

关于log4j日志采集的过程
具体的流程分为

书写 log4j的日志产生代码
log4j--->flume的测试过程
log4j--->flume-->kafka的测试过程
log4j--->flume-->kafka---->streaming的测试过程

log4j的日志产生代码编写

package com.cn.spark04.kafka

import org.apache.log4j.{BasicConfigurator, Level, Logger, PropertyConfigurator}

object Log4jTest {
def main(args: Array[String]): Unit = {
if(args.length!=1){
System.err.println("Usage: Log4jTest <log4j.properties path>")
System.exit(1)
}
val logger = Logger.getLogger(Log4jTest.getClass.getName)
BasicConfigurator.configure()
//log4j.properties文件的路径
PropertyConfigurator.configure(args(0))
logger.setLevel(Level.INFO)
var index = 0
while ( {
true
}) {
Thread.sleep(1000)
//System.out.println("111");
logger.info("value is :" + {
index += 1; index - 1
})
}
}
}

导入的依赖是(注意flume的版本)

<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.11.2</version>
</dependency>

<dependency>
<groupId>org.apache.flume.flume-ng-clients</groupId>
<artifactId>flume-ng-log4jappender</artifactId>
<version>1.9.0</version>
</dependency>
将依赖和代码一起打成jar包

log4j.properties的配置文件
#指定根Logger，及日志输出级别，大于等于该级别的日志将被输出（ DEBUG < INFO < WARN < ERROR < FATAL ）设为OFF可以关闭日志
log4j.rootLogger=INFO,stdout,flume
#指定log输出目的,这里设为输出日志到指定目录的文件my.log中
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} [%t] [%c] [%p] - %m%n

#注意因为使用以下的这个类，所以药导入响应的jar包
log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
#注意这儿的主机和端口要和flume的源进行对应
log4j.appender.flume.Hostname = 172.17.78.220
log4j.appender.flume.Port = 41414
log4j.appender.flume.UnsafeMode = true

2.的flume文件首先进行测试，即将日志查看的数据输出到控制台

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 172.17.78.220
a1.sources.r1.port = 41414

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

先启动：flumebin/flume-ng agent --conf conf --conf-file ../streamingscript/flum.conf --name a1 -Dflume.root.logger=INFO,console
再启动日志程序：java -cp sparkstreaming-1.0-SNAPSHOT-jar-with-dependencies.jar com.cn.spark04.kafka.Log4jTest /software/streamingscript/g4j.properties

3. 书写flume文件，从flume到kafka

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 172.17.78.220
a1.sources.r1.port = 41414

# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = 172.17.78.220:9092
a1.sinks.k1.kafka.topic = kafka_streaming
a1.sinks.k1.flumeBatchSize = 20

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

以下以此启动的顺序是：
1）启动kafka
2)启动flumeflume:bin/flume-ng agent --conf conf --conf-file ../streamingscript/flum_kafka.conf --name a1 -Dflume.root.logger=INFO,console
3)启动kafka的消费者：bin/kafka-console-consumer.sh --bootstrap-server 172.17.78.220:9092 --topic kafka_streaming
4）启动日志产生器：java -cp sparkstreaming-1.0-SNAPSHOT-jar-with-dependencies.jar com.cn.spark04.kafka.Log4jTest /software/streamingscript/log4j.properties

以上各个环节即将log4j-->flume-->kafka开发完毕

4.以下是将于sparkstreaming进行结合

/**
* 这里直接使用Kafka broker version 0.10.0 or higher
*/
object Kafka_streaming {

def main(args: Array[String]): Unit = {

if(args.length!=3){
System.err.println("Usage:Kafka_streaming <bootstrap.servers (',' seperate)> <group.id> <topics(',' seperate)>")
System.exit(1)
}
val sparkConf=new SparkConf()
val ssc=new StreamingContext(sparkConf,Seconds(5))
val Array(servers,groupId,topic)=args
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> servers,
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> groupId,
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)

val topics = topic.split(",")
val ds = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
ds.map(_.value()).count().print()
ssc.start()
ssc.awaitTermination()
}
}

然后启动sparkstreaming
其启动的顺序的是：
以下以此启动的顺序是：
1）启动kafka
2)启动flumeflume:bin/flume-ng agent --conf conf --conf-file ../streamingscript/flum_kafka.conf --name a1 -Dflume.root.logger=INFO,console
3）/software/spark-2.4.0/bin/spark-submit --class com.cn.spark04.kafka.Kafka_streaming --name Kafka_streaming --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.0 --master local[2] /software/flume-1.9/executescript/sparkstreaming-1.0-SNAPSHOT.jar 172.17.78.220:9092 group01 kafka_streaming
4）启动日志产生器：java -cp sparkstreaming-1.0-SNAPSHOT-jar-with-dependencies.jar com.cn.spark04.kafka.Log4jTest /software/streamingscript/log4j.properties
到此为止就打通了
log4j---->flume---->kafka--->spark-streaming