Flume+Hadoop大数据采集部署

引言

在大数据处理中,日志数据的采集是数据分析的第一步。Apache Flume是一个分布式、可靠且可用的系统,用于有效地收集、聚合和移动大量日志数据到集中式数据存储。本文将详细介绍如何使用Flume采集日志数据,并将其上传到Hadoop分布式文件系统(HDFS)中。

Flume简介

Apache Flume是一个高可用的、高可靠的,分布式的海量日志采集、聚合和传输的系统。它基于流式架构,提供了灵活性和简单性,能够实时读取服务器本地磁盘的数据,并将数据写入到HDFS。

系统要求

  • Hadoop
    Hadoop 2.8.0
    百度网盘链接:https://pan.baidu.com/s/16VZGWk4kdiJ6GYxDP5BUew
    提取码:j9fa
  • Flume 1.9.0
    百度网盘链接:https://pan.baidu.com/s/1eLLKeQWaMvPjSJziEewfVA
    提取码:3q2s
  • Centos 7

Flume配置结构

Flume的配置文件定义了数据流的来源和去向。以下是一个基本的配置示例,它定义了一个简单的Flume Agent,该Agent从一个本地端口收集数据,并将其输出到控制台
在这里插入图片描述
Flume的架构上可以知道,它主要分为三部分source、sink和channel

配置Flume

在/opt/server/flume/conf目录下创建flume-hdfs.conf文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1
#配置source
a1.sources.r1.type = exec
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
a1.sources.r1.command = tail -F /var/log/flume-test.log # 设置要执行的命令,实时读取指定日志文件的最新内容

# 配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://zhang:9000/flume/logs/ #zhang为主机名,指令hostname一下就可以显示自己主机名,flume/logs/ 该文件路径不用手动创建文件夹,程序会自动创建
# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 绑定 source 和sink 到 channel
a1.sources.r1.channels = c1 #注意此处channels 多了个S
a1.sinks.k1.channel = c1

启动Hadoop hdfs

start-dfs.sh

若HDFS搭建启动正常,打开Wed界面如下
在这里插入图片描述
也可以使用hdfs dfs -ls /

[root@localhost 192 conf]# hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - root supergroup          0 2024-07-13 08:09 /flume
-rw-r--r--   1 root supergroup          0 2024-07-12 19:53 /test.txt

能查询到文件即可

启动Flume

需要在flume-hdfs.conf文件下,运行启动命令

[root@localhost 192 conf]# flume-ng agent --conf ./ --conf-file flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /opt/server/flume/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/server/hadoop-2.8.0/bin/hadoop) for HDFS access
Info: Including Hive libraries found via () for Hive access
+ exec /usr/lib/jvm/jdk1.8.0_65/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/opt/server/flume/conf:/opt/server/flume/lib/*:/opt/server/hadoop-2.8.0/etc/hadoop:/opt/server/hadoop-2.8.0/share/hadoop/common/lib/*:/opt/server/hadoop-2.8.0/share/hadoop/common/*:/opt/server/hadoop-2.8.0/share/hadoop/hdfs:/opt/server/hadoop-2.8.0/share/hadoop/hdfs/lib/*:/opt/server/hadoop-2.8.0/share/hadoop/hdfs/*:/opt/server/hadoop-2.8.0/share/hadoop/yarn/lib/*:/opt/server/hadoop-2.8.0/share/hadoop/yarn/*:/opt/server/hadoop-2.8.0/share/hadoop/mapreduce/lib/*:/opt/server/hadoop-2.8.0/share/hadoop/mapreduce/*:/opt/server/hadoop-2.8.0/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/opt/server/hadoop-2.8.0/lib/native org.apache.flume.node.Application --conf-file flume-hdfs.conf --name a1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/server/flume/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/server/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2024-07-14 05:36:25,840 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting
2024-07-14 05:36:25,850 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:138)] Reloading configuration file:flume-hdfs.conf
2024-07-14 05:36:25,861 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1
2024-07-14 05:36:25,862 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1
2024-07-14 05:36:25,862 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1
2024-07-14 05:36:25,862 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1
2024-07-14 05:36:25,862 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1117)] Added sinks: k1 Agent: a1
2024-07-14 05:36:25,862 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2024-07-14 05:36:25,862 (conf-file-poller-0) [INFO 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

终有一刻

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值