Flume数据采集

overview

  1. 介绍
  2. 安装
  3. 使用

 

  1. 介绍
    1. 数据采集工具,非高可用
  2. 安装
    1. 从官网下载包
    2. 解压
    3. 配置文件
      1. 静态文件:

#定义三大组件的名称

ag1.sources = source1

ag1.sinks = sink1

ag1.channels = channel1

 

# 配置source组件

ag1.sources.source1.type = spooldir

ag1.sources.source1.spoolDir = /root/log/

ag1.sources.source1.fileSuffix=.FINISHED

ag1.sources.source1.deserializer.maxLineLength=5120

 

# 配置sink组件

ag1.sinks.sink1.type = hdfs

ag1.sinks.sink1.hdfs.path =hdfs://hdp-01:9000/access_log/%y-%m-%d/%H-%M

ag1.sinks.sink1.hdfs.filePrefix = app_log

ag1.sinks.sink1.hdfs.fileSuffix = .log

ag1.sinks.sink1.hdfs.batchSize= 100

ag1.sinks.sink1.hdfs.fileType = DataStream

ag1.sinks.sink1.hdfs.writeFormat =Text

 

## roll:滚动切换:控制写文件的切换规则

ag1.sinks.sink1.hdfs.rollSize = 512000 ## 按文件体积(字节)来切

ag1.sinks.sink1.hdfs.rollCount = 1000000 ## 按event条数切

ag1.sinks.sink1.hdfs.rollInterval = 60 ## 按时间间隔切换文件

 

## 控制生成目录的规则

ag1.sinks.sink1.hdfs.round = true

ag1.sinks.sink1.hdfs.roundValue = 10

ag1.sinks.sink1.hdfs.roundUnit = minute

 

ag1.sinks.sink1.hdfs.useLocalTimeStamp = true

 

# channel组件配置

ag1.channels.channel1.type = memory

ag1.channels.channel1.capacity = 500000 ## event条数

ag1.channels.channel1.transactionCapacity = 600 ##flume事务控制所需要的缓存容量600条event

 

# 绑定source、channel和sink之间的连接

ag1.sources.source1.channels = channel1

ag1.sinks.sink1.channel = channel1

 

      1. 非静态文件

tail-hdfs.conf

 

用tail命令获取数据,下沉到hdfs

启动命令:

bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1

########

 

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /root/app_weichat_login.log

 

# Describe the sink

agent1.sinks.sink1.type = hdfs

agent1.sinks.sink1.hdfs.path =hdfs://hdp20-01:9000/app_weichat_login_log/%y-%m-%d/%H-%M

agent1.sinks.sink1.hdfs.filePrefix = weichat_log

agent1.sinks.sink1.hdfs.fileSuffix = .dat

agent1.sinks.sink1.hdfs.batchSize= 100

agent1.sinks.sink1.hdfs.fileType = DataStream

agent1.sinks.sink1.hdfs.writeFormat =Text

 

agent1.sinks.sink1.hdfs.rollSize = 100

agent1.sinks.sink1.hdfs.rollCount = 1000000

agent1.sinks.sink1.hdfs.rollInterval = 60

 

agent1.sinks.sink1.hdfs.round = true

agent1.sinks.sink1.hdfs.roundValue = 1

agent1.sinks.sink1.hdfs.roundUnit = minute

 

 

agent1.sinks.sink1.hdfs.useLocalTimeStamp = true

 

 

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

      1. 两级串联(做数据加密等中间操作)

服务端

从avro端口接收数据,下沉到hdfs

#####

bin/flume-ng agent -c conf -f conf/avro-m-log.conf -n a1 -Dflume.root.logger=INFO,console

 

 

采集配置文件,avro-hdfs.conf

 

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

##source中的avro组件是一个接收者服务

a1.sources.r1.type = avro

a1.sources.r1.bind = hdp-05

a1.sources.r1.port = 4141

 

 

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /flume/taildata/%y-%m-%d/

a1.sinks.k1.hdfs.filePrefix = tail-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 24

a1.sinks.k1.hdfs.roundUnit = hour

a1.sinks.k1.hdfs.rollInterval = 0

a1.sinks.k1.hdfs.rollSize = 0

a1.sinks.k1.hdfs.rollCount = 50

a1.sinks.k1.hdfs.batchSize = 10

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

 

 

发送数据:

$ bin/flume-ng avro-client -H localhost -p 4141 -F /usr/logs/log.10

客户端

从tail命令获取数据发送到avro端口

另一个节点可配置一个avro源来中继数据,发送外部存储

 

 

##################

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /root/log/access.log

 

 

# Describe the sink

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = hdp-05

a1.sinks.k1.port = 4141

a1.sinks.k1.batch-size = 2

 

 

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值