大数据分析架构本地环境搭建及测试

最新推荐文章于 2025-04-26 21:49:14 发布

EnderWang

最新推荐文章于 2025-04-26 21:49:14 发布

阅读量544

点赞数

CC 4.0 BY-SA版权

分类专栏：大数据

本文链接：https://blog.youkuaiyun.com/windon12345/article/details/90444207

大数据专栏收录该内容

3 篇文章

订阅专栏

该博客介绍大数据日志收集和分析功能，流程为应用产生本地日志，flume收集到kafka，spark分析结果存DB，页面查询显示。还阐述环境搭建，涉及flume、zookeeper、kafka、hadoop、spark等安装配置，以及测试过程和相关大数据命令。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

功能：log 日志收集和分析

流程：1.应用程序产生本地log文件

2.flume监控文件并收集日志到kafka中

3.spark Structure streaming监听kafka获取结构流进行分析，结果输出到DB

4.页面通过查询DB显示结果

环境搭建：1.flume（apache-flume-1.9.0-bin）

（1）下载压缩包解压

（2）修改配置文件（采用spooldir souce ，memory channel，kafka sink）

# define agent
testAgent.sources = testSource
testAgent.channels = testChannel
testAgent.sinks = testSink

# define source
testAgent.sources.testSource.type = spooldir
testAgent.sources.testSource.spoolDir = /bigData/flumeTest
testAgent.sources.testSource.fileHeader = true

#testAgent.sources.testSource.type = TAILDIR
#testAgent.sources.testSource.positionFile = /bigData/flumeTest/taildir_position.json
#testAgent.sources.testSource.filegroups = f1
#testAgent.sources.testSource.filegroups.f1 = /bigData/flumeTest/hello.txt
#testAgent.sources.testSource.headers.f1.headerKey1 = value1
#testAgent.sources.testSource.fileHeader = true
#testAgent.sources.testSource.maxBatchCount = 1000

# define sink
#testAgent.sinks.testSink.type = logger
#testAgent.sinks.testSink.type = file_roll
#testAgent.sinks.testSink.sink.directory = /bigData/sinkTest

testAgent.sinks.testSink.type = org.apache.flume.sink.kafka.KafkaSink
testAgent.sinks.testSink.kafka.topic = test
testAgent.sinks.testSink.kafka.bootstrap.servers = 127.0.0.1:9092
testAgent.sinks.testSink.kafka.flumeBatchSize = 20
testAgent.sinks.testSink.kafka.producer.acks = 1
testAgent.sinks.testSink.kafka.producer.linger.ms = 1
testAgent.sinks.testSink.kafka.producer.compression.type = snappy


# define channel
testAgent.channels.testChannel.type= memory
testAgent.channels.testChannel.capacity=1000
testAgent.channels.testChannel.transactionCapacity=100

#bind source&sink channel
testAgent.sources.testSource.channels = testChannel
testAgent.sinks.testSink.channel = testChannel

2.zookeeper（zookeeper-3.4.5），kafka（kafka_2.12-2.2.0）安装

（1）下载压缩包，配置环境变量

3.hadoop（hadoop-2.7.7），spark（spark-2.4.3-bin-hadoop2.7）安装

（1）下载压缩包，配置环境变量

测试过程：1.启动zookeeper

zkserver

2.启动kafka

.\bin\windows\kafka-server-start.bat .\config\server.properties

3.启动spark监听程序

4.启动flume

bin\flume-ng.cmd agent -n testAgent -c conf -f conf\flume-conf.properties.template -property

"flume.root.logger=INFO,console"