Flume安装测试

一、Flume的安装

  1. 准备两台机器:

wyg01.para.com(192.168.244.3)

wyg02.para.com(192.168.244.4)

    2、通过安装包解压安装

 

[root@wyg01 app]# tar -zxvf apache-flume-1.8.0-bin.tar.gz -C /app/

 

3、修改文件名:cp flume-env.sh.template flume-env.sh

 

4、更改配置:vi flume-env.sh

 

export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera

  1. 修改环境变量

vim /etc/profile

添加如下配置:

#flume

export FLUME_HOME=/app/apache-flume-1.8.0-bin

export PATH=$FLUME_HOME/bin:$PATH

 

 

  1. 生效环境变量

source /etc/profile

  1. 验证

flume-ng version

 

  • Flume的使用

案例1:监控端口数据

  1. 创建一个目录: mkdir -p /app/apache-flume-1.8.0-bin/options

 

  1. 创建配置文件

vim example.conf

添加如下内容:

#me the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

#默认该通道中最大可以存储的event数量

a1.channels.c1.capacity = 1000

#每次最大可以从source中拿到或者送到sink中的event数量

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

  1. 启动flume

flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console

 

表示已经开始监控44444端口了

  1. 另外打开一个页面

yum install telnet

telnet localhost 44444

telnet下载完成后可以往端口写入数据

 

  1. 验证

 

在另外一个窗口:实时收集日志信息

应用场景:机器未安装kafka、hdfs等其余任何收集组件,该机器就是要收集,这种时候就可通过flume

 

案例2:两个Flume做集群

  1. 第二台机器安装flume

scp -r /app/apache-flume-1.8.0-bin wyg02.para.com:/app/

  1. 拷贝配置文件

scp  /etc/profile wyg02.para.com:/etc/

[root@wyg02 app]# source /etc/profile

[root@wyg02 app]# flume-ng version

 

  1. wyg01.para.com主机的配置文件

vim two_flume.conf

 

新建监控文件:

touch /app/flume.txt

 

添加如下内容:

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

#监控的是flume.txt这个文件

a1.sources.r1.type = exec

a1.sources.r1.command=tail -F /app/flume.txt

#跨集群的连接点

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = wyg02.para.com

a1.sinks.k1.port = 45454

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

注:Flume的特性----复杂流动

Flume允许用户构建多跳流程,其中事件在到达最终目的地之前会通过多个代理传播。它还允许扇入和扇出流,上下文路由和备份路由(故障转移)

为了使数据跨多个代理或跃点流动,前一个代理的接收器和当前跃点的源必须为avro类型,接收器指向源的主机名(或IP地址)和端口

  1. wyg02.para.com主机的配置文件

cd /app/apache-flume-1.8.0-bin/options/

 

vim two_flume.conf

添加如下内容:

a2.sources = r1

a2.sinks = k1

a2.channels = c1

a2.sources.r1.type = avro

a2.sources.r1.bind = wyg02.para.com

a2.sources.r1.port = 45454

a2.sinks.k1.type = logger

a2.channels.c1.type = memory

a2.channels.c1.capacity = 1000

a2.channels.c1.transactionCapacity = 100

a2.sources.r1.channels = c1

a2.sinks.k1.channel = c1

  1. 启动wyg02.para.com主机的flume

flume-ng agent -n a2 -c options/ -f two_flume.conf -Dflume.root.logger=INFO,console

 

  1. 启动wyg01.para.com主机的flume

flume-ng agent -n a1 -c options/ -f two_flume.conf

 

  1. 打开wyg01.para.com主机的第二个窗口

[root@wyg01 ~]# echo "11111111111111" >> /app/flume.txt

[root@wyg01 ~]# echo "2222222222222222" >> /app/flume.txt

 

  1. 在wyg02.para.com主机上查看

 

验证将wyg01的数据日志传输到wyg02上去

案例3:监控目录将其输出

  1. wyg01.para.com主机上编辑配置文件

cd /app/apache-flume-1.8.0-bin/options/

vi spooling.conf

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /home/logs

a1.sources.r1.fileHeader = true

a1.sinks.k1.type = logger

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

  1. 查看配置路径下的文件

cd /home/logs/

 

3、启动flume

flume-ng agent -n a1 -c /app/apache-flume-1.8.0-bin/options/ -f spooling.conf -Dflume.root.logger=INFO,console

 

 

  1. flume运行中复制文件到/home/logs/目录下查看

打开另一个wyg01.para.com主机的窗口执行命令:

cp /etc/profile /home/logs/

 

这时再查看原窗口的操作台:

 

新窗口查看:Flume读取完文件后都将其重命名

 

  1. 再次执行相同的复制命令则会报错,因为该文件已经存在与/home/logs/目录下

 

这时报错后Flume的服务已经停止

要想继续执行必须要重启Flume

案例4:上传hdfs sink

我所在的集群环境是三台机器搭建的CDH6.2.1集群,若未搭建集群可安装hadoop集群和zookeeper,主节点启动hdfs:start-dfs.sh   所有节点启动zookeeper:zkServer.shstart即可测试

在案例3的基础上进行测试

 

  1. 配置文件

cd /app/apache-flume-1.8.0-bin/options/

 

 vi hdfs.conf

a1.sources = r1

a1.sinks = k1

a1.channels = c1

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /home/logs

a1.sources.r1.fileHeader = true

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://wyg01.para.com:8020/tmp/%Y-%m-%d/%H%M

a1.sinks.k1.hdfs.rollCount=0

a1.sinks.k1.hdfs.rollInterval=60

a1.sinks.k1.hdfs.rollSize=10240

a1.sinks.k1.hdfs.idleTimeout=3

a1.sinks.k1.hdfs.flieType=DataStream

a1.sinks.k1.hdfs.useLocalTimeStamp=true

a1.sinks.k1.hdfs.round=true

a1.sinks.k1.hdfs.roundValue=5

a1.sinks.k1.hdfs.roundUnit=minute

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

 

后台运行:

nohup flume-ng agent -n a1 -c /app/apache-flume-1.8.0-bin/options/ -f hdfs.conf &

  1. 启动flume

flume-ng agent -n a1 -c /app/apache-flume-1.8.0-bin/options/ -f hdfs.conf

 

  1. 验证日志是否上传到hdfs上

 

案例5:上传图片以及其他文件到hdfs上

 

vi hdfs1.conf

内容如下:

# ==== start ====

agent.sources = spooldirsource

agent.channels = memoryChannel

agent.sinks = hdfssink

# For each one of the sources, the type is defined

agent.sources.spooldirsource.type = spooldir

# The channel can be defined as follows.

agent.sources.spooldirsource.channels = memoryChannel

agent.sources.spooldirsource.spoolDir =/home/logs/

agent.sources.spooldirsource.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder

agent.sources.spooldirsource.deserializer.maxBlobLength = 100000000

# Each sink's type must be defined

agent.sinks.hdfssink.type = hdfs

#Specify the channel the sink should use

agent.sinks.hdfssink.channel = memoryChannel

# ns1 是高可用地址

# /%Y/%m/%d 根据日期动态写目录

agent.sinks.hdfssink.hdfs.path = hdfs://10.169.112.150:8020/tmp/%Y/%m/%d

agent.sinks.hdfssink.hdfs.useLocalTimeStamp = true

#agent.sinks.hdfssink.hdfs.fileSuffix = .jpg

#agent.sinks.hdfssink.hdfs.fileSuffix = .docx

agent.sinks.hdfssink.hdfs.fileType = DataStream

# Each channel's type is defined.

agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

agent.channels.memoryChannel.capacity = 100000

运行:flume-ng agent -n agent -c /app/apache-flume-1.8.0-bin/options/ -f hdfs1.conf

再向/home/logs路径下添加照片再hdfs dfs -ls /tmp下查看是否上传,同时hue页面的文件里也可以查看图片

案例6:Flume负载均衡

通过三台机器来进行模拟flume的负载均衡 三台机器规划如下:

wyg01.para.com(192.168.244.3):采集数据,发送到wyg02和wyg03机器上去

wyg02.para.com(192.168.244.4):接收wyg01的部分数据

wyg03.para.com(192.168.244.5):接收wyg01的部分数据

注:三台机器操作相同已安装Flume

  1. 开发wyg01服务器的flume配置

[root@wyg01 conf]# cd /app/apache-flume-1.8.0-bin/options

[root@wyg01 conf]# vi load_banlancer_client.conf

# agent name

<p class="mume-header " id="agent-name"></p>

a1.channels = c1

a1.sources = r1

a1.sinks = k1 k2

# set gruop

<p class="mume-header " id="set-gruop"></p>

a1.sinkgroups = g1

# set channel

<p class="mume-header " id="set-channel"></p>

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /app/taillogs/access_log

# set sink1

<p class="mume-header " id="set-sink1"></p>

a1.sinks.k1.channel = c1

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = wyg02.para.com

a1.sinks.k1.port = 52020

# set sink2

<p class="mume-header " id="set-sink2"></p>

a1.sinks.k2.channel = c1

a1.sinks.k2.type = avro

a1.sinks.k2.hostname = wyg03.para.com

a1.sinks.k2.port = 52020

# set sink group

<p class="mume-header " id="set-sink-group"></p>

a1.sinkgroups.g1.sinks = k1 k2

# set failover

<p class="mume-header " id="set-failover"></p>

a1.sinkgroups.g1.processor.type = load_balance

a1.sinkgroups.g1.processor.backoff = true

a1.sinkgroups.g1.processor.selector = round_robin

a1.sinkgroups.g1.processor.selector.maxTimeOut=10000

  1. 开发wyg02服务器的flume配置

[root@wyg02 bin]# cd /app/apache-flume-1.8.0-bin/options

[root@wyg02 conf]# vi load_banlancer_server.conf

# Name the components on this agent

<p class="mume-header " id="name-the-components-on-this-agent"></p>

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

<p class="mume-header " id="describeconfigure-the-source"></p>

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = wyg02.para.com

a1.sources.r1.port = 52020

# Describe the sink

<p class="mume-header " id="describe-the-sink"></p>

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

<p class="mume-header " id="use-a-channel-which-buffers-events-in-memory"></p>

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

<p class="mume-header " id="bind-the-source-and-sink-to-the-channel"></p>

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

  1. 开发wyg03服务器的flume配置

[root@wyg03 bin]# cd /app/apache-flume-1.8.0-bin/options

[root@wyg03 conf]# vi load_banlancer_server.conf

# Name the components on this agent

<p class="mume-header " id="name-the-components-on-this-agent-1"></p>

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

<p class="mume-header " id="describeconfigure-the-source-1"></p>

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = wyg03.para.com

a1.sources.r1.port = 52020

# Describe the sink

<p class="mume-header " id="describe-the-sink-1"></p>

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

<p class="mume-header " id="use-a-channel-which-buffers-events-in-memory-1"></p>

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

<p class="mume-header " id="bind-the-source-and-sink-to-the-channel-1"></p>

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

  1. 分别启动三台主机的Flume
  1. 启动wyg03.para.com主机的Flume:

[root@wyg03 ~]# cd /app/apache-flume-1.8.0-bin/bin/

[root@wyg03 bin]# ./flume-ng agent -c /app/apache-flume-1.8.0-bin/conf -f /app/apache-flume-1.8.0-bin/options/load_banlancer_server.conf -n a1 -Dflume.root.logger=INFO,console

 

  1. 启动wyg02.para.com主机的Flume:

[root@wyg02 ~]# cd /app/apache-flume-1.8.0-bin/bin/

[root@wyg02 bin]#  ./flume-ng agent -c /app/apache-flume-1.8.0-bin/conf -f /app/apache-flume-1.8.0-bin/options/load_banlancer_server.conf -n a1 -Dflume.root.logger=INFO,console

 

 

  1. 启动wyg01.para.com主机的Flume:

[root@wyg01 bin]# cd /app/apache-flume-1.8.0-bin/bin/

[root@wyg01 bin]# ./flume-ng agent -c /app/apache-flume-1.8.0-bin/conf -f /app/apache-flume-1.8.0-bin/options/load_banlancer_client.conf -n a1 -Dflume.root.logger=INFO,console

 

 

  1. 验证负载均衡
  1. 另外打开wyg01.para.com的终端,并往之前在该主机配置文件路径中的日志文件中写入内容并观察wyg02.para.com主机和wyg03.para.com主机的变化

 

(2)如图,可以看到wyg03.para.com主机有了变化

 

(3)观察发现控制台上可以看到刚才添加的内容

 

  1. 再一次向wyg01.para.com的主机中的日志文件添加内容,观察另外两台主机的变化发现此时wyg02.para.com这台主机的控制台上打印了刚才添加到日志文件的内容

 

 

 

(5)此时,将wyg03.para.com主机上的Flume关闭,再添加内容到日志文件中去

 

 

(6)wyg01.para.com主机和wyg02.para.com主机都发生了变化

 

(7)wyg01.para.com这台主机提示报错,报错内容事无法连接wyg03.para.com这台主机,因为之前把这台主机的Flume给关闭了

 

(8)由于接受数据内容的两台主机有一台已经关闭,所以日志的输出只能再wyg02.para.com这台主机的控制台上打印

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值