Flume将MySQL表数据存入到HBase

本文详细介绍了如何使用Flume将MySQL表数据同步至HBase,包括使用SimpleHbaseEventSerializer、RegexHbaseEventSerializer及SimpleAsyncHbaseEventSerializer序列化模式的具体配置与实践效果。

Flume将MySQL表数据存入到HBase

HBasesink的三种序列化模式

  • SimpleHbaseEventSerializer
  • RegexHbaseEventSerializer
  • SimpleAsyncHbaseEventSerializer

使用SimpleHbaseEventSerializer序列化模式

一、在HBase中创建table1

hbase(main):021:0> create 'default:table1', 'info'
Created table default:table1
Took 1.3042 seconds
=> Hbase::Table - table1

二、flume的配置文件

agent.channels = ch1
agent.sinks = hbase-sink
agent.sources = sql-source
agent.channels.ch1.type = memory
agent.sources.sql-source.channels = ch1
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource


agent.sources.sql-source.hibernate.connection.url = jdbc:mysql://192.168.1.69:3306/t_hadoop
agent.sources.sql-source.hibernate.connection.user = root  
agent.sources.sql-source.hibernate.connection.password = root
agent.sources.sql-source.table = t_name
agent.sources.sql-source.columns.to.select = *

agent.sources.sql-source.incremental.column.name = id
agent.sources.sql-source.incremental.value = 0

agent.sources.sql-source.run.query.delay=5000

agent.sources.sql-source.status.file.path = /home/lwenhao/flume
agent.sources.sql-source.status.file.name = sql-source.status


# sink 配置为HBaseSink 和 SimpleHbaseEventSerializer
agent.sinks.hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink
#HBase表名
agent.sinks.hbase-sink.table = table1
#HBase表的列族名称
agent.sinks.hbase-sink.columnFamily  = info
agent.sinks.hbase-sink.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
#HBase表的列族下的某个列名称
agent.sinks.hbase-sink.serializer.payloadColumn = id,sip,dip,sport,dport,protocol,flowvalue,createtime
# 组合sink和channel
agent.sinks.hbase-sink.channel = ch1

三、启动flume

 bin/flume-ng agent --conf conf/ --name agent --conf-file conf/flume-hbase.conf -Dflume.root.logger=DEBUG,console

四、效果

字段对应的值存在问题,原因:SimpleHbaseEventSerializer只能进行简单的匹配,数据已经存入hbase。如果想多个字段匹配怎么办?使用RegexHbaseEventSerializerSimpleAsyncHbaseEventSerializer,也可以自定义。

使用RegexHbaseEventSerializer序列化模式

RegexHbaseEventSerializer可以使用正则匹配切割event,然后存入HBase表的多个列

先清空table1

truncate 'table1'

一、修改flume的配置文件

agent.channels = ch1
agent.sinks = hbase-sink
agent.sources = sql-source
agent.channels.ch1.type = memory
agent.sources.sql-source.channels = ch1
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource

agent.sources.sql-source.hibernate.connection.url = jdbc:mysql://192.168.1.69:3306/t_hadoop
agent.sources.sql-source.hibernate.connection.user = root
agent.sources.sql-source.hibernate.connection.password = root
agent.sources.sql-source.table = t_name
agent.sources.sql-source.columns.to.select = *
agent.sources.sql-source.incremental.column.name = id
agent.sources.sql-source.incremental.value = 0
agent.sources.sql-source.run.query.delay=5000
agent.sources.sql-source.status.file.path = /home/lwenhao/flume
agent.sources.sql-source.status.file.name = sql-source.status

agent.sinks.hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.hbase-sink.table = table1
agent.sinks.hbase-sink.columnFamily  = info
agent.sinks.hbase-sink.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.hbase-sink.serializer.regex = ^\"(.*?)\",\"(.*?)\",\"(.*?)\",\"(.*?)\",\"(.*?)\",\"(.*?)\",\"(.*?)\",\"(.*?)\"$
agent.sinks.hbase-sink.serializer.colNames = id,sip,dip,sport,dport,protocol,flowvalue,createtime
agent.sinks.hbase-sink.channel = ch1

二、启动flume

 bin/flume-ng agent --conf conf/ --name agent --conf-file conf/flume-hbase.conf

三、效果

转载于:https://my.oschina.net/lwenhao/blog/3018565

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值