常见Flume-source、channel、sink
Flume核心概念
Event
- Event是流经flume agent的最小数据单元。一个Event(由Event接口实现)从source流向channel,再到sink。
- Event包含了一个payload(byte array)和可选的header(string attributes)。
- 一个flume agent就是一个jvm下的进程:控制着Events从一个外部的源头到一个外部的目的地。
Source
- Source的目的是从外部客户端接收数据并将其存储到已配置的Channels中。
- 将接收的数据以Flume的event格式传递给一个或者多个通道channel。
- Flume提供多种数据接收的方式,比如Avro,Thrift等。
- source 必须至少和一个channel关联,不同类型的Source
与系统集成的Source:Syslog,Netcat,监测目录池
自动生成事件的Source:Exec
用于Agent和Agent之间通信的IPC source:avro,thrift
Channel
- channel是一种短暂的存储容器,它将从source处接收到的event格式的数据缓存起来,直到它们被sinks消费掉,它在source和sink间起着桥梁的作用。
- channel是一个完整的事务,这一点保证了数据在收发的时候的一致性. 并且它可以和任意数量的source和sink链接.。
- 支持的类型有:
Memory channel :volatile (不稳定的)
File Channel:基于WAL( 预写式日志Write-Ahead logging)实现
JDBC channel :基于嵌入式database实现 - 可以和任何数量的source和sink工作,channel 的内容只输出一次,同一个event 如果sink1 输出,sink2 不输出;如果sink1 输出,sink1 不输出。 最终 sink1+sink2=channel 中的数据。
Sink
- sink将数据存储到集中存储器比如Hbase和HDFS。
- 从channals消费数据(events)并将其传递给目标地. 目标地可能是另一个sink,也可能HDFS,HBase.
- 存储event到最终目的地终端sink,比如 HDFS,HBase
自动消耗的sink 比如 null sink
用于agent间通信的IPC:sink:Avro
必须作用于一个确切的channel
Flume Sources
Avro Source
Listens on Avro port and receives events from external Avro client streams. When paired with the built-in Avro Sink on another (previous hop) Flume agent, it can create tiered collection topologies. Required properties are in bold.
监听Avro端口并从外部Avro客户端流接收事件,可以监听服务器指定端口
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be avro |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
threads | – | Maximum number of worker threads to spawn |
selector.type | ||
selector.* | ||
interceptors | – | Space-separated list of interceptors |
interceptors.* | ||
compression-type | none | This can be “none” or “deflate”. The compression-type must match the compression-type of matching AvroSource |
ssl | false | Set this to true to enable SSL encryption. You must also specify a “keystore” and a “keystore-password”. |
keystore | – | This is the path to a Java keystore file. Required for SSL. |
keystore-password | – | The password for the Java keystore. Required for SSL. |
keystore-type | JKS | The type of the Java keystore. This can be “JKS” or “PKCS12”. |
exclude-protocols | SSLv3 | Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified. |
ipFilter | false | Set this to true to enable ipFiltering for netty |
ipFilterRules | – | Define N netty ipFilter pattern rules with this config. |
样例:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
Exec Source
Exec source runs a given Unix command on start-up and expects that process to continuously produce data on standard out (stderr is simply discarded, unless property logStdErr is set to true). If the process exits for any reason, the source also exits and will produce no further data. This means configurations such as cat [named pipe] or tail -F [file] are going to produce the desired results where as date will probably not - the former two commands produce streams of data where as the latter produces a single event and exits.
用于执行linux命令
Property | Name | Default Description |
---|---|---|