flume

最新推荐文章于 2024-06-20 16:24:26 发布

转载最新推荐文章于 2024-06-20 16:24:26 发布 · 552 阅读

文章标签：

#es

本文介绍如何使用Flume进行实时数据采集，通过配置Agent的Source、Channel和Sink组件来监控指定目录，并将新增文件内容备份至HDFS。文章详细展示了配置过程，并解决了常见错误。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Flume简介

　Flume是一个分布式的、可靠的、实用的服务——从不同的数据源高效的采集、整合、移动海量数据。
　
　 flume
Agent是Flume的一个JVM进程，其中包含Source、Channel、Sink三个组件，要采集的数据经过定制化的Source缓存到Channel，Channel是一个队列，Source向里面插入数据，Sink从里面取数据，当Sink确认数据被下一跳Agent或者DB等接收后会将数据从Channel删除。

Source类型

source

Channel类型

channel

Sink类型

sink

实时收集某个目录下的文件的变化情况，将新增的文件备份到hdfs上

(Spooling Directory是监控指定文件夹中新文件的变化，一旦新文件出现，就解析该文件内容，然后写入到channle。写入完成后，标记该文件已完成或者删除该文件)

（一）修改flume.conf的配置文件

#vim flume.conf
#agent1表示代理名称  
agent1.sources=source1  
agent1.sinks=sink1  
agent1.channels=channel1  

#配置source1  
agent1.sources.source1.type=spooldir  
agent1.sources.source1.spoolDir=/testflume/  
agent1.sources.source1.channels=channel1  
agent1.sources.source1.fileHeader = false  
agent1.sources.source1.interceptors = i1  
agent1.sources.source1.interceptors.i1.type = timestamp  

#配置channel1  
agent1.channels.channel1.type=file  
agent1.channels.channel1.checkpointDir=/testflume/hmbbs_tmp123 
agent1.channels.channel1.dataDirs=/testflume/hmbbs_tmp  

#配置sink1  
agent1.sinks.sink1.type=hdfs  
agent1.sinks.sink1.hdfs.path=hdfs://192.168.1.164:8020/testflume
agent1.sinks.sink1.hdfs.fileType=DataStream  
agent1.sinks.sink1.hdfs.writeFormat=TEXT  
agent1.sinks.sink1.hdfs.rollInterval=1  
agent1.sinks.sink1.channel=channel1  
agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d

（2）CM上重启服务flume服务

（3）验证：

1、在本机的/testflume目录下新增文件，HDFS会自动收集到/testflume目录下
#cp /etc/passwd  /testflume
2、HDFS上查看
# hadoop fs -ls /testflume
-rw-r--r-- 3 flume supergroup 388 2017-03-18 16:03     /flumetest/2017-03-18.1489824261816
-rw-r--r-- 3 flume supergroup 490 2017-03-18 16:03 /flumetest/2017-03-18.1489824261817

报错及解决：

1）

HDFS IO error
java.io.IOException: Failed on local exception: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "test2/192.168.1.165"; destination host is: "test2":9000; 
...........
Caused by: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.

解决：

CDH中NameNode端口fs.default.name=8020
agent1.sinks.sink1.hdfs.path=hdfs://192.168.1.165:8020/testflume

2）

HDFS IO error
org.apache.hadoop.ipc.RemoteException
(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error

解决：

CDH的配置HA，192.168.1.165此时为备用机器
agent1.sinks.sink1.hdfs.path=hdfs://192.168.1.164:8020/testflume

3）

HDFS IO error
org.apache.hadoop.security.AccessControlException: Permission denied: user=flume, access=WRITE, inode="/testflume":root:supergroup:drwxr-xr-x

解决：