flume安装目录
安装包及测试文件
提取码: ktc2
一、安装Flume
1.压缩包拉入opt目录中
2.解压flume到opt
tar -zxvf flume-ng-1.6.0-cdh5.14.0.tar.gz
3.重命名flume
mv apache-flume-1.6.0-cdh5.14.0-bin/ flume
4.切换到flume的conf
cd flume/conf
5.拷贝一份flume-env.sh.template,重命名为flume-env.sh
cp flume-env.sh.template flume-env.sh
6.修改flume-env.sh
//修改JAVA_HOME
export JAVA_HOME=/opt/jdk1.8.0_221
//将#去掉,修改内存为2G
export JAVA_OPTS="-Xms2000m -Xmx2000m -Dcom.sun.management.jmxremote"
如图所示:
7.验证安装
切换到bin目录
./flume-ng version
二、安装nc
全称NetCat 俗称 “瑞士军刀”
yum install -y nc
三、安装telnet
yum list telnet*
yum install -y telnet-server.*
yum install -y telnet.*
测试
//另开一个session启动server 后面端口号为自定义
nc -lk 7777
////另开一个session启动client
telnet localhost 7777
四、案例
首先,在conf目录创建job
mkdir job
案例1:端口到控制台
在job中创建netcat-flume-logger.conf
cd job
vi netcat-flume-logger.conf
//编辑如下内容 a1为agent名称
a1.sources=r1
a1.channels=c1
a1.sinks=s1
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=7777
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
a1.sinks.s1.type=logger
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
//sources可以有多个channel channel的出口向sinks只能有一个
编辑完保存退出
返回flume的根目录启动 agent a1
bin/flume-ng agent --name a1 --conf conf/ --conf-file conf/job/netcat-flume-logger.conf -Dflume.root.logger=INFO,console
案例2:文件到控制台
在job中创建file-flume-logger.conf
vi conf/job/file-flume-logger.conf
//编辑如下内容 a2为agent名称
a2.sources=r1
a2.channels=c1
a2.sinks=s1
a2.sources.r1.type=exec
a2.sources.r1.command=tail -f /opt/flume/conf/job/tmp/tmp.txt
a2.channels.c1.type=memory
a2.channels.c1.capacity=1000
a2.channels.c1.transactionCapacity=1000
a2.sinks.s1.type=logger
a2.sources.r1.channels=c1
a2.sinks.s1.channel=c1
创建tmp.txt文件
vi /opt/flume/conf/job/tmp/tmp.txt
flume的根目录启动 agent a2
bin/flume-ng agent --name a2 --conf conf/ --conf-file conf/job/file-flume-logger.conf -Dflume.root.logger=INFO,console
在tmp.txt文件中追加任意内容
echo xxx >> tmp.txt
案例3:events到控制台
在job中创建events-flume-logger.conf
创建dataSourceFile 原数据路径
创建checkpointFile 检查点路径
创建dataChannelFile
events.sources=eventsSource
events.channels=eventsChannel
events.sinks=eventsSink
events.sources.eventsSource.type=spooldir
events.sources.eventsSource.spoolDir=/opt/flume/conf/job/dataSourceFile/events
events.sources.eventsSource.deserializer=LINE
events.sources.eventsSource.deserializer.maxLineLength=10000
events.sources.eventsSource.includePattern=events_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
events.channels.eventsChannel.type=file
events.channels.eventsChannel.checkpointDir=/opt/flume/conf/job/checkpointFile/events
events.channels.eventsChannel.dataDirs=/opt/flume/conf/job/dataChannelFile/events
events.sinks.eventsSink.type=logger
events.sources.eventsSource.channels=eventsChannel
events.sinks.eventsSink.channel=eventsChannel
在将准备好的数据拷贝到dataSourceFile
mkdir /opt/flume/conf/job/dataSourceFile/events
cp events.csv /opt/flume/conf/job/dataSourceFile/events/events_2020-11-30.csv
flume的根目录启动 agent events
bin/flume-ng agent --name events --conf conf/ --conf-file conf/job/events-flume-logger.conf -Dflume.root.logger=INFO,console
案例4:文件到hdfs
user_friend.csv
在job中创建 userfriend-flume-hdfs.conf
user_friend.sources=userFriendSource
user_friend.channels=userFriendChannel
user_friend.sinks=userFriendSink
user_friend.sources.userFriendSource.type=spooldir
user_friend.sources.userFriendSource.spoolDir=/opt/flume/conf/job/dataSourceFile/userFriend
user_friend.sources.userFriendSource.deserializer=LINE
user_friend.sources.userFriendSource.deserializer.maxLineLength=200000
user_friend.sources.userFriendSource.includePattern=userfriend_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
user_friend.channels.userFriendChannel.type=file
user_friend.channels.userFriendChannel.checkpointDir=/opt/flume/conf/job/checkpointFile/userFriend
user_friend.channels.userFriendChannel.dataDirs=/opt/flume/conf/job/dataChannelFile/userFriend
user_friend.sinks.userFriendSink.type=hdfs
user_friend.sinks.userFriendSink.hdfs.fileType=DataStream
//文件的前缀
user_friend.sinks.userFriendSink.hdfs.filePrefix=userFriend
//文件的后缀
user_friend.sinks.userFriendSink.hdfs.fileSuffix=.csvuser_friend.sinks.userFriend
Sink.hdfs.path=hdfs://192.168.184.40:9000/file/user/userFriend/%Y-%m-%d
user_friend.sinks.userFriendSink.hdfs.useLocalTimeStamp=true
user_friend.sinks.userFriendSink.hdfs.batchSize=640
user_friend.sinks.userFriendSink.hdfs.rollInterval=20
user_friend.sinks.userFriendSink.hdfs.rollCount=0
user_friend.sinks.userFriendSink.hdfs.rollSize=120000000
user_friend.sources.userFriendSource.channels=userFriendChannel
user_friend.sinks.userFriendSink.channel=userFriendChannel
在dataSourceFile、checkpointFile、dataChannelFile 分别创建userFriend目录
//拷贝user_friends.csv到dataSourceFile的userFriend中
cp user_friends /opt/flume/conf/job/dataSourceFile/userFriend/userFriend_2020-11-30.csv
flume的根目录启动 agent user_friend
bin/flume-ng agent --name user_friend --conf conf/ --conf-file conf/job/userfriend-flume-hdfs.conf -Dflume.root.logger=INFO,console