3、flume数据导入到Hdfs中

本文介绍如何使用Apache Flume进行日志数据的收集、处理并导入HDFS的过程。通过配置Avro source、Memory channel及HDFS sink,实现日志数据从产生到存储的完整流程。同时展示了如何设置拦截器来添加主机信息和时间戳。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >



[root@baozi apache-flume-1.5.2-bin]# vim conf/agent2.conf


agent2.sources=source1
agent2.channels=channel1
agent2.sinks=sink1


agent2.sources.source1.type=avro
agent2.sources.source1.bind=0.0.0.0
agent2.sources.source1.port=44444
agent2.sources.source1.channels=channel1


agent2.sources.source1.interceptors = i1 i2
agent2.sources.source1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Builder
agent2.sources.source1.interceptors.i1.preserveExisting = true
agent2.sources.source1.interceptors.i1.useIP = true
agent2.sources.source1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder


agent2.channels.channel1.type=memory
agent2.channels.channel1.capacity=10000
agent2.channels.channel1.transactionCapacity=1000
agent2.channels.channel1.keep-alive=30


agent2.sinks.sink1.type=hdfs
agent2.sinks.sink1.channel=channel1
agent2.sinks.sink1.hdfs.path=hdfs://192.168.1.200:9000/flume/events/%{host}/%y-%m-%d
agent2.sinks.sink1.hdfs.fileType=DataStream
agent2.sinks.sink1.hdfs.writeFormat=Text
agent2.sinks.sink1.hdfs.rollInterval=0
agent2.sinks.sink1.hdfs.rollSize=10000
agent2.sinks.sink1.hdfs.rollCount=0
agent2.sinks.sink1.hdfs.idleTimeout=5


启动44444端口:
数据导入到HDFS中这个端口要一直启动:

[root@baozi apache-flume-1.5.2-bin]# bin/flume-ng agent --conf ./conf/ -Dflume.monitoring.type=http -Dflumetoring.port=34343 -n agent2 -f conf/agent2.conf &


log4j.properties:
#log4j.appender.flume.Port = 41414
log4j.appender.flume.Port = 44444



HDFS上的目录/flume/events:
[root@baozi hadoop]# hdfs dfs -ls -R /flume
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
drwxr-xr-x   - root supergroup          0 2015-06-21 12:50 /flume/events
[root@baozi hadoop]#



运行代码:
package flume;

import java.text.SimpleDateFormat;
import org.apache.log4j.Logger;

public class LogProducer {
public static void main(String[] args) {
Logger log=Logger.getLogger(LogProducer.class);
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
while (true) {
log.info("日志格式:"+sdf.format(System.currentTimeMillis()));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}




查看HDFS里的数据:

[root@baozi hadoop]# hdfs dfs -ls -R /flume/events
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
drwxr-xr-x   - root supergroup          0 2015-06-21 12:55 /flume/events/192.168.1.200
drwxr-xr-x   - root supergroup          0 2015-06-21 12:55 /flume/events/192.168.1.200/15-06-21
-rw-r--r--   1 root supergroup          0 2015-06-21 12:55 /flume/events/192.168.1.200/15-06-21/FlumeData.1434862524993.tmp

[root@baozi hadoop]#



oot@baozi hadoop]# hdfs dfs -ls -R /flume/events/192.168.1.200/15-06-21
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
-rw-r--r--   1 root supergroup       2940 2015-06-21 12:56 /flume/events/192.168.1.200/15-06-21/FlumeData.1434862524993
[root@baozi hadoop]#


[root@baozi hadoop]# hdfs dfs -ls -R /flume/events
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
drwxr-xr-x   - root supergroup          0 2015-06-21 12:55 /flume/events/192.168.1.200
drwxr-xr-x   - root supergroup          0 2015-06-21 12:56 /flume/events/192.168.1.200/15-06-21
-rw-r--r--   1 root supergroup       2940 2015-06-21 12:56 /flume/events/192.168.1.200/15-06-21/FlumeData.1434862524993

[root@baozi hadoop]# hdfs dfs -text /flume/events/192.168.1.200/15-06-21/FlumeData.1434862524993
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
日志格式:2015-06-21 12:52:12
日志格式:2015-06-21 12:52:13
日志格式:2015-06-21 12:52:14
日志格式:2015-06-21 12:52:15
日志格式:2015-06-21 12:52:16
日志格式:2015-06-21 12:52:17
日志格式:2015-06-21 12:52:18
日志格式:2015-06-21 12:52:19






[root@baozi hadoop]#



Flume是一个分布式、可靠、高可用的数据收集系统,常用于日志收集和数据移动。为了将CSV文件从本地导入HDFS(Hadoop分布式文件系统),你可以按照以下步骤操作: 1. **安装和配置Flume**: - 安装Apache Flume并启动Agent节点,如Source、Channel和Sink组件。 - 确保你已经配置了HDFS Sink,它允许Flume数据写入HDFS。 2. **创建Flume Source**: - 使用`TextFile`源(`org.apache.flume.source.textfile.TextFileSource`),它可以读取CSV文件。 - 配置Source,指定CSV文件的位置、编码(如果必要)、行分割符等属性。 ```xml <source> <exec> <command>cat /path/to/csv/file.csv</command> </exec> </source> ``` 3. **设置Flume Channel**: - 通常使用内存通道`MemorySink`(`org.apache.flume.sink记忆channel.MemoryChannel`)作为临时存储,然后流向HDFS Sink。 4. **配置HDFS Sink**: - 创建一个`HDFS` Sink,指定HDFS目录以及文件命名策略。 ```xml <sink> <hdfs> <configuration> <property> <name>fs.default.name</name> <value>hdfs://namenode:port</value> </property> <property> <name>hadoop.security.authentication</name> <value>simple</value> <!-- 根据集群安全模式选择 --> </property> <!-- 其他HDFS配置如:fileType、append等 --> </configuration> <path>/user/flume/${sys:timestamp}</path> </hdfs> </sink> ``` 5. **连接Source、Channel和Sink**: - 将上述Source和Sink关联起来,并配置它们之间的关系。 6. **启动Flume Agent**: - 启动包含以上配置的Flume Agent,数据开始从CSV文件流往HDFS。 ```bash bin/flume-ng agent --conf-file conf/flume.conf --name my-flume-agent --master master:9090 start ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值