flume日志采集

1.  Log4j Appender

1.1.  使用说明

1.1.2.  Client端Log4j配置文件

(黄色文字为需要配置的内容)

log4j.rootLogger=INFO,A1,R

 

 

# ConsoleAppender out

log4j.appender.A1=org.apache.log4j.ConsoleAppender

log4j.appender.A1.layout=org.apache.log4j.PatternLayout

log4j.appender.A1.layout.ConversionPattern=%d{yyyy/MM/ddHH:mm:ss}%-5p%-10C{1} %m%n

# File out

//日志Appender修改为flume提供的Log4jAppender

log4j.appender.R=org.apache.flume.clients.log4jappender.Log4jAppender

log4j.appender.R.File=${catalina.home}/logs/ultraIDCPServer.log

//日志需要发送到的端口号,该端口要有ARVO类型的source在监听

log4j.appender.R.Port =44444

//日志需要发送到的主机ip,该主机运行着ARVO类型的source

log4j.appender.R.Hostname =localhost

log4j.appender.R.MaxFileSize=102400KB

# log4j.appender.R.MaxBackupIndex=5

log4j.appender.R.layout=org.apache.log4j.PatternLayout

log4j.appender.R.layout.ConversionPattern=%d{yyyy/MM/ddHH\:mm\:ss}%-5p%-10C{1} %m%n

log4j.appender.R.encoding=UTF-8

 

log4j.logger.com.ultrapower.ultracollector.webservice.MessageIntercommunionInterfaceImpl=INFO,webservice

log4j.appender.webservice=org.apache.log4j.FileAppender

log4j.appender.webservice.File=${catalina.home}/logs/logsMsgIntercommunionInterface.log

log4j.appender.webservice.layout=org.apache.log4j.PatternLayout

log4j.appender.webservice.layout.ConversionPattern=%d{yyyy/MM/ddHH\:mm\:ss}%-5p[%t]%l%X-%m%n

log4j.appender.webservice.encoding=UTF-8

注:Log4jAppender继承自AppenderSkeleton,没有日志文件达到特定大小,转换到新的文件的功能

1.1.3.  flume agent配置

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

# Describe/configure source1

agent1.sources.source1.type = avro

agent1.sources.source1.bind = 192.168.0.141

agent1.sources.source1.port = 44444

 

# Describe sink1

agent1.sinks.sink1.type = FILE_ROLL

agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out

 

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.capacity = 1000

agent1.channels.channel1.transactionCapactiy = 100

 

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

注:生成的文件的规则为每隔固定时间间隔生成一个新的文件,文件里面保存该时间段agent接收到的信息

1.2.  分析

1.       使用简便,工作量小。

2.       用户应用程序使用log4j作为日志记录jar包,而且项目中使用的jar包要在log4j-1.2.15版本以上,

3.       应用系统必须将flume所需jar包引入到项目中。如下所示为所有必须jar包:可能会存在jar冲突,影响应用运行

4.       能够提供可靠的数据传输,使用flume log4jAppender采集日志可以不在客户机上启动进程,而只通过修改logapppender直接把日志信息发送到采集机(参见图一),此种情况可以保证采集机接受到数据之后的数据可靠性,但是客户机与采集机连接失败时候数据会丢失。改进方案是在客户机上启动一个agent,这样可以保证客户机和采集机不能连通时,当能连通是日志也被采集上来,不会发送数据的丢失(参见图二),为了可靠性,需在客户机上启动进程

               

1.3.  日志代码

Log.info(“this message has DEBUG in it”);

1.4.  采集到的数据样例

this message has DEBUG in it
this message has DEBUG in it

 

2.  Exec source(放弃)

The problem with ExecSource and other asynchronous sources is that thesource can not guarantee that if there is a failure to put the event into theChannel the client knows about it. In such cases, the data will be lost. As afor instance, one of the most commonly requested features is thetail -F [file]-like use casewhere an application writes to a log file on disk and Flume tails the file,sending each line as an event. While this is possible, there’s an obviousproblem; what happens if the channel fills up and Flume can’t send an event?Flume has no way of indicating to the application writing the log file that itneeds to retain the log or that the event hasn’t been sent, for some reason. Ifthis doesn’t make sense, you need only know this: Your application can neverguarantee data has been received when using a unidirectional asynchronousinterface such as ExecSource! As an extension of this warning - and to becompletely clear - there is absolutely zero guarantee of event delivery whenusing this source. You have been warned.

注:即使是agent内部的可靠性都不能保证

2.1.  使用说明

2.1.1.  flume agent配置

 
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'
# example.conf: A single-node Flume configuration
 
# Name the components on this agent
 
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# Describe/configure source1
#agent1.sources.source1.type = avro
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /home/yubojie/logs/ultraIDCPServer.log
#agent1.sources.source1.bind = 192.168.0.146
#agent1.sources.source1.port = 44444
 
agent1.sources.source1.interceptors = a
agent1.sources.source1.interceptors.a.type = org.apache.flume.interceptor.HostInterceptor$Builder
agent1.sources.source1.interceptors.a.preserveExisting = false
agent1.sources.source1.interceptors.a.hostHeader = hostname
 
 
# Describe sink1
#agent1.sinks.sink1.type = FILE_ROLL
#agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = hdfs://localhost:9000/user/
agent1.sinks.sink1.hdfs.fileType = DataStream
 
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapactiy = 100
 
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

 

2.2.  分析

1.         tail方式采集日志需要宿主主机能够执行tail命令,应该是只有linux系统可以执行,不支持window系统日志采集

2.         EXEC采用异步方式采集,会发生日志丢失,即使在节点内的数据也不能保证数据的完整

3.         tail方式采集需要宿主操作系统支持tail命令,即原始的windows操作系统不支持tail命令采集

2.3.  采集到的数据样例

2012/10/26 02:36:34 INFO  LogTest     this message has DEBUG 中文 in it
2012/10/26 02:40:12 INFO  LogTest     this message has DEBUG 中文 in it

2.4.  日志代码

Log.info(“this message has DEBUG 中文 in it”);

3.  Syslog

Passing messages using syslogprotocol doesn't work well for longer messages.  The syslog appender forLog4j is hardcoded to linewrap around 1024 characters in order to comply withthe RFC.  I got a sample program logging to syslog, picking it up with asyslogUdp source, with a JSON layout (to avoid new-lines in stack traces) onlyto find that anything but the smallest stack trace line-wrapped anyway.  Ican't see a way to reliably reconstruct the stack trace once it is wrapped andsent through the flume chain.(注:内容不确定是否1.2版本

   Syslog TCP需要指定eventsize,默认为2500

   Syslog UDP为不可靠传输,数据传输过程中可能出现丢失数据的情况。

3.1.  使用说明

3.1.1.  Client端示例代码

import java.io.IOException;

importjava.io.OutputStream;

import java.net.Socket;

import java.net.UnknownHostException;

 

 

publicclass SyslogTcp {

 publicstaticvoid main(String args[]){

     Socket client = null;

     OutputStream out =null;

    try {

       client = new Socket("127.0.0.1", 5140);

       out= client.getOutputStream(); 

        String event = "<4>hello\n"

        out.write(event.getBytes()); 

        out.flush(); 

        System.out.println("发送成功 ");

    catch (UnknownHostException e) {

       //TODO Auto-generated catch block

       e.printStackTrace();

    catch (IOException e) {

       //TODO Auto-generated catch block

       e.printStackTrace();

    finally{

       try {

           out.close();

       catch (IOException e) {

           //TODO Auto-generated catch block

           e.printStackTrace();

       

        try {

           client.close();

       catch (IOException e) {

           //TODO Auto-generated catch block

           e.printStackTrace();

       }

    }

      

 }

}

 

 

3.1.2.  日志接收的flume agent配置

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

 

# Describe/configure source1

agent1.sources.source1.type = syslogtcp

agent1.sources.source1.bind = 127.0.0.1

agent1.sources.source1.port = 5140

 

# Describe sink1

#agent1.sinks.sink1.type = avro

#agent1.sinks.sink1.channels = channel1

#agent1.sinks.sink1.hostname = 192.168.0.144

#agent1.sinks.sink1.port = 44444

 

agent1.sinks.sink1.type = FILE_ROLL

agent1.sinks.sink1.sink.directory = E:\\file-out

 

 

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.capacity = 1000

agent1.channels.channel1.transactionCapactiy = 100

 

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

3.2.     分析

需要编写Client采集代码,增量采集日志信息通过socket发送到flume agent;对于长数据处理不是很理想。可靠性可以参考log4j appender的方式来保证。

 

 

4.  日志过滤InterceptorFLUME-1358)

Flume支持依据正则表达式过滤event,但是在1.2.0的源代码中没有发现具体实现的代码,根据FLUME-1358的说明信息,可以将RegexFilteringInterceptor类加入到代码中使用。

需要的操作为:

添加类RegexFilteringInterceptor

修改InterceptorType,添加type与类的映射关系:

REGEX_FILTER(org.apache.flume.interceptor.RegexFilteringInterceptor.Builder.class)

4.1.  Regex FilteringInterceptor说明

This interceptor filters events selectively by interpreting the eventbody as text and matching the text against a configured regular expression. Thesupplied regular expression can be used to include events or exclude events.

Property Name

Default

Description

type

The component type name has to be REGEX_FILTER

regex

”.*”

Regular expression for matching against events

excludeRegex

false

If true, regex determines events to exclude, otherwise regex determines events to include.

4.2.  使用说明(测试配置)

4.2.1.  日志接收的Flume agent配置

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

 

# Describe/configure source1

agent1.sources.source1.type = avro

agent1.sources.source1.bind = localhost

agent1.sources.source1.port = 5140

 

 

agent1.sources.source1.interceptors = inter1

agent1.sources.source1.interceptors.inter1.type = REGEX_FILTER

agent1.sources.source1.interceptors.inter1.regex = .*DEBUG.*

agent1.sources.source1.interceptors.inter1.excludeRegex = false

 

 

 

 

# Describe sink1

#agent1.sinks.sink1.type = avro

#agent1.sinks.sink1.channels = channel1

#agent1.sinks.sink1.hostname = 192.168.0.144

#agent1.sinks.sink1.port = 44444

 

agent1.sinks.sink1.type = FILE_ROLL

agent1.sinks.sink1.sink.directory = E:\\file-out

 

 

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.capacity = 1000

agent1.channels.channel1.transactionCapactiy = 100

 

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

5.  HDFS SINK

5.1.  使用说明

输出到hdfs的数据,首先在hdfs上创建文件.tmp,然后文件关闭时,将tmp后缀去掉,存储方案与file输出类似,可以设定时间间隔、文件大小、接受事件条数作为滚动生成新文件的依据,默认30s

 

5.2.  可配置项

Name

Default

Description

channel

 

type

The component type name, needs to be hdfs

hdfs.path

HDFS directory path (eg hdfs://namenode/flume/webdata/)

hdfs.filePrefix

FlumeData

Name prefixed to files created by Flume in hdfs directory

hdfs.rollInterval

30

Number of seconds to wait before rolling current file (0 = never roll based on time interval)

hdfs.rollSize

1024

File size to trigger roll, in bytes (0: never roll based on file size)

hdfs.rollCount

10

Number of events written to file before it rolled (0 = never roll based on number of events)

hdfs.batchSize

1

number of events written to file before it flushed to HDFS

hdfs.txnEventMax

100

 

hdfs.codeC

Compression codec. one of following : gzip, bzip2, lzo, snappy

hdfs.fileType

SequenceFile

File format: currently SequenceFile,DataStream orCompressedStream(1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC

hdfs.maxOpenFiles

5000

 

hdfs.writeFormat

“Text” or “Writable”

hdfs.appendTimeout

1000

 

hdfs.callTimeout

10000

 

hdfs.threadsPoolSize

10

Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)

hdfs.rollTimerPoolSize

1

Number of threads per HDFS sink for scheduling timed file rolling

hdfs.kerberosPrincipal

Kerberos user principal for accessing secure HDFS

hdfs.kerberosKeytab

Kerberos keytab for accessing secure HDFS

hdfs.round

false

Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)

hdfs.roundValue

1

Rounded down to the highest multiple of this (in the unit configured usinghdfs.roundUnit), less than current time.

hdfs.roundUnit

second

The unit of the round down value - second,minute orhour.

serializer

TEXT

Other possible options include AVRO_EVENT or the fully-qualified class name of an implementation of theEventSerializer.Builder interface.

serializer.*

 

 

5.3.  Agent配置样例

 
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'
# example.conf: A single-node Flume configuration
 
# Name the components on this agent
 
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# Describe/configure source1
#agent1.sources.source1.type = avro
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /home/yubojie/logs/ultraIDCPServer.log
#agent1.sources.source1.bind = 192.168.0.146
#agent1.sources.source1.port = 44444
 
agent1.sources.source1.interceptors = a
agent1.sources.source1.interceptors.a.type = org.apache.flume.interceptor.HostInterceptor$Builder
agent1.sources.source1.interceptors.a.preserveExisting = false
agent1.sources.source1.interceptors.a.hostHeader = hostname
 
 
# Describe sink1
#agent1.sinks.sink1.type = FILE_ROLL
#agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = hdfs://192.168.98.20:9000/user/hadoop/yubojietest
agent1.sinks.sink1.hdfs.fileType = DataStream
 
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapactiy = 100
 
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

 

6.  多agent采集文件到hdfs

6.1.  准备工作

1.         文件采集类打包成jar放到flume/apache-flume-1.2.0/lib目录下

2.         创建fileSourceRecorder.properties空文件放到flume/apache-flume-1.2.0/conf下(将要修改为如果文件不存在则创建该文件,后续将不用再创建这个文件)

 

6.2.  agent配置文件

6.2.1.  agent1

# example.conf: A single-node Flume configuration     

                                                                                                                                                         

# Name the components on this agent                

                                                    

agent1.sources = source1                                                                              

agent1.sinks = sink1                                                                          

agent1.channels = channel1                        

                                                   

# Describe/configure source1                       

agent1.sources.source1.type = com.ultrapower.ultracollector.flume.source.file.FileSource                                                

 

agent1.sources.source1.path = /home/yubojie/logs/ultraIDCPServer.log

#gbk,utf-8 

agent1.sources.source1.encoding = utf-8

agent1.sources.source1.onceMaxReadByte = 999

agent1.sources.source1.cacheQueueSize = 10

agent1.sources.source1.noChangeSleepTime = 1000

agent1.sources.source1.batchCommitSize = 5

agent1.sources.source1.batchWaitTime = 500        

       

#agent1.sources.source1.type = avro

#agent1.sources.source1.bind = localhost

#agent1.sources.source1.port = 44444

                                                                                      

# Describe sink1                                                                                       

#agent1.sinks.sink1.type = logger          

#agent1.sinks.sink1.type = FILE_ROLL

#agent1.sinks.sink1.sink.directory = E:/file-out

#agent1.sinks.sink1.sink.fileName = a.log

agent1.sinks.sink1.type = hdfs

#agent1.sinks.sink1.hdfs.path = hdfs://192.168.98.20:9000/user/hadoop/yubojietest

agent1.sinks.sink1.hdfs.path = hdfs://192.168.0.153:9000/user/file

agent1.sinks.sink1.hdfs.callTimeout = 20000

agent1.sinks.sink1.hdfs.fileType = DataStream

#agent1.sinks.sink1.sink.rollInterval = 30        

               

                                          

# Use a channel which buffers events in memory        

agent1.channels.channel1.type = memory             

agent1.channels.channel1.capacity = 1000           

agent1.channels.channel1.transactionCapactiy = 100 

                                          

                                                  

# Bind the source and sink to the channel                                                              

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1       

    

########################## test method ########################################

 

#########start flume agent ######### 

#agent -n agent1 -f .\conf\flume-conf.properties.template.file.signle

 

######### client send message #########

 

# $ bin/flume-ng avro-client -H localhost -p 44444 -F 'F:/1/log.log'

6.2.2.  agent2

# example.conf: A single-node Flume configuration     

                                                                                                                                                         

# Name the components on this agent                 

                                                   

agent2.sources = source1                                                                              

agent2.sinks = sink1                                                                          

agent2.channels = channel1                        

                                                   

# Describe/configure source1                       

agent2.sources.source1.type = com.ultrapower.ultracollector.flume.source.file.FileSource                                                

 

agent2.sources.source1.path = /home/yubojie/logtest/logs/ultraIDCPServer.log

#gbk,utf-8 

agent2.sources.source1.encoding = utf-8

agent2.sources.source1.onceMaxReadByte = 999

agent2.sources.source1.cacheQueueSize = 10

agent2.sources.source1.noChangeSleepTime = 1000

agent2.sources.source1.batchCommitSize = 5

agent2.sources.source1.batchWaitTime = 500        

       

#agent1.sources.source1.type = avro

#agent1.sources.source1.bind = localhost

#agent1.sources.source1.port = 44444

                                                                                     

# Describe sink1                                                                                        

#agent1.sinks.sink1.type = logger          

#agent1.sinks.sink1.type = FILE_ROLL

#agent1.sinks.sink1.sink.directory = E:/file-out

#agent1.sinks.sink1.sink.fileName = a.log

agent2.sinks.sink1.type = hdfs

#agent1.sinks.sink1.hdfs.path = hdfs://192.168.98.20:9000/user/hadoop/yubojietest

agent2.sinks.sink1.hdfs.path = hdfs://192.168.0.153:9000/user/file

agent2.sinks.sink1.hdfs.callTimeout = 20000

agent2.sinks.sink1.hdfs.fileType = DataStream

#agent1.sinks.sink1.sink.rollInterval = 30        

               

                                          

# Use a channel which buffers events in memory        

agent2.channels.channel1.type = memory             

agent2.channels.channel1.capacity = 1000           

agent2.channels.channel1.transactionCapactiy = 100 

                                          

                                                  

# Bind the source and sink to the channel                                                               

agent2.sources.source1.channels = channel1

agent2.sinks.sink1.channel = channel1       

    

########################## test method ########################################

 

#########start flume agent ######### 

#agent -n agent1 -f .\conf\flume-conf.properties.template.file.signle

 

######### client send message #########

 

# $ bin/flume-ng avro-client -H localhost -p 44444 -F 'F:/1/log.log'

6.3.  启动命令

flume-ng agent -name agent1 -c conf -f ../conf/flume-conf.properties  

//agent1监控/home/yubojie/logs/ultraIDCPServer.log

flume-ng agent -name agent2 -c conf -f ../conf/flume-conf2.properties

//agent2监控/home/yubojie/logtest/logs/ultraIDCPServer.log

6.4.  测试结果

1.         agent1和agent2各自监控相应文件,互不干涉

2.         文件各自输出到hdfs生成各自的文件

6.  参考资料:

资料

日志采集

https://issues.cloudera.org//browse/FLUME-27
http://archive.cloudera.com/cdh/3/flume-ng-1.2.0-cdh3u5/FlumeUserGuide.html#exec-source
http://www.quora.com/Flume/What-Flume-sources-do-people-use-in-production
http://blog.youkuaiyun.com/rzhzhz/article/details/7610252
过滤:https://issues.apache.org/jira/secure/attachment/12537520/FLUME-1358.patch.v4.txt
https://issues.apache.org/jira/browse/FLUME-1358

 

RegexFilteringInterceptor源代码

 

 

packageorg.apache.flume.interceptor;

 

importstaticorg.apache.flume.interceptor.RegexFilteringInterceptor.Constants.DEFAULT_EXCLUDE_EVENTS;

importstatic org.apache.flume.interceptor.RegexFilteringInterceptor.Constants.DEFAULT_REGEX;

importstatic org.apache.flume.interceptor.RegexFilteringInterceptor.Constants.EXCLUDE_EVENTS;

importstatic org.apache.flume.interceptor.RegexFilteringInterceptor.Constants.REGEX;

 

import java.util.List;

import java.util.regex.Pattern;

 

import org.apache.flume.Context;

import org.apache.flume.Event;

import org.slf4j.Logger;

importorg.slf4j.LoggerFactory;

 

import com.google.common.collect.Lists;

 

publicclass RegexFilteringInterceptorimplements Interceptor {

 

  privatestaticfinal Loggerlogger =LoggerFactory

      .getLogger(RegexFilteringInterceptor.class);

 

  privatefinal Patternregex;

  privatefinalbooleanexcludeEvents;

 

  /**

   *Only{@link RegexFilteringInterceptor.Builder}canbuildme

   */

  private RegexFilteringInterceptor(Pattern regex,boolean excludeEvents) {

    this.regex = regex;

    this.excludeEvents = excludeEvents;

  }

 

  @Override

  publicvoid initialize() {

    // no-op

  }

 

 

  @Override

  /**

   *Returnstheeventifitpassestheregularexpressionfilterandnull

   *otherwise.

   */

  public Event intercept(Event event) {

    // We've already ensured here that at most one of includeRegex and

    // excludeRegex are defined.

 

    if (!excludeEvents) {

      if (regex.matcher(new String(event.getBody())).find()) {

        return event;

      }

      else {

        returnnull;

      }

    }

    else {

      if (regex.matcher(new String(event.getBody())).find()) {

        returnnull;

      }

      else {

        return event;

      }

    }

  }

 

  /**

   *Returnsthesetofeventswhichpassfilters,accordingto

   *{@link #intercept(Event)}.

   *@paramevents

   *@return

   */

  @Override

  public List<Event> intercept(List<Event> events) {

    List<Event> out = Lists.newArrayList();

    for (Event event : events) {

      Event outEvent = intercept(event);

      if (outEvent !=null) { out.add(outEvent); }

    }

    return out;

  }

  @Override

  publicvoid close() {

    // no-op

  }

  /**

   *BuilderwhichbuildsnewinstanceoftheStaticInterceptor.

   */

  publicstaticclass Builderimplements Interceptor.Builder {

 

    private Patternregex;

    privatebooleanexcludeEvents;

    @Override

    publicvoid configure(Context context) {

      String regexString = context.getString(REGEX,DEFAULT_REGEX);

      regex = Pattern.compile(regexString);

      excludeEvents = context.getBoolean(EXCLUDE_EVENTS,

          DEFAULT_EXCLUDE_EVENTS);

    }

    @Override

    public Interceptor build() {

      logger.info(String.format(

          "Creating RegexFilteringInterceptor: regex=%s,excludeEvents=%s",

          regex,excludeEvents));

      returnnew RegexFilteringInterceptor(regex,excludeEvents);

    }

  }

  publicstaticclass Constants {

    publicstaticfinal StringREGEX ="regex";

    publicstaticfinal StringDEFAULT_REGEX =".*";

    publicstaticfinal StringEXCLUDE_EVENTS ="excludeEvents";

    publicstaticfinalbooleanDEFAULT_EXCLUDE_EVENTS = false;

  }

}

InterceptorType源代码

黄色为添加内容

package org.apache.flume.interceptor;

 

public enum InterceptorType {

 

  TIMESTAMP(org.apache.flume.interceptor.TimestampInterceptor.Builder.class),

  HOST(org.apache.flume.interceptor.HostInterceptor.Builder.class),

  REGEX_FILTER(org.apache.flume.interceptor.RegexFilteringInterceptor.Builder.class),

  ;

 

  private final Class<? extends Interceptor.Builder> builderClass;

 

  private InterceptorType(Class<? extends Interceptor.Builder> builderClass) {

    this.builderClass = builderClass;

  }

 

  public Class<? extends Interceptor.Builder> getBuilderClass() {

    return builderClass;

  }

 

}

### Logstash 和 Flume日志采集中的使用场景及配置 #### 架构设计与组件对比 Logstash 和 Flume 是两种常见的开源数据采集工具,在架构设计上有一定的相似之处。Fluentd 的架构设计和 Flume 类似,其输入/缓冲区/输出 (Input/Bufffer/Output) 对应于 Flume 中的 Source/Channel/Sink[^1]。然而,Logstash 则采用了插件化的架构设计,支持多种输入、过滤器以及输出插件。 对于灵活性而言,Logstash 提供了更丰富的内置功能,可以直接对接到 Elasticsearch 进行存储分析,这种组合被称为 ELK 堆栈(Elasticsearch, Logstash, Kibana)。如果用户不希望依赖 Logstash 自带的存储机制,则可以通过调整配置实现与其他系统的集成[^2]。 #### 数据处理能力 在实际应用中,Flume 更适合大规模分布式环境下的日志收集任务。它允许开发者通过自定义 source 来满足特定需求,比如编写 Java 代码来扩展功能并打包成 JAR 文件部署至 lib 目录下。此外,还可以修改 ES Sink 的源码以便更好地适配目标数据库,并决定 channel 是否需要定制化开发以选择不同的持久化策略(如内存或磁盘)[^5]。 相比之下,Logstash 能够利用强大的过滤器插件完成复杂的事件转换操作,例如解析 JSON 格式的字段或者执行正则表达式匹配等动作。这些特性使得 Logstash 成为了实时数据分析的理想选择之一。 #### 故障恢复机制 当涉及到高可用性和容错性时,两者也有各自的特点。例如,在某些情况下企业可能会考虑将 Kafka 加入到流水线当中作为中间消息队列服务——这样即使发生中断也能依靠 offset 记录继续传输未完成的部分而不是重新开始整个过程[^4]。而对于单纯使用 Flume 的情况来说,默认会保存已发送成功的部分到本地文件系统从而避免重复上传相同内容的现象出现。 以下是两个简单的例子展示如何分别设置这两种工具来进行基本的日志捕获: #### 配置示例 ##### Logstash 配置样例 ```yaml input { file { path => "/var/log/*.log" start_position => "beginning" } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } } output { elasticsearch { hosts => ["http://localhost:9200"] } } ``` ##### Flume 配置样例 ```properties agent.sources = tailSource agent.sinks = loggerSink agent.channels = memoryChannel agent.sources.tailSource.type = exec agent.sources.tailSource.command = tail -F /var/log/messages.log agent.sources.tailSource.interceptors = timestampInterceptor agent.sinks.loggerSink.type = logger agent.sinks.loggerSink.channel = memoryChannel agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 1000 ``` #### 总结 综上所述,虽然二者都能很好地胜任日志采集工作,但在具体的选型过程中还需要综合考量项目的技术背景和个人偏好等因素做出最终决策。通常来讲,如果你的应用程序已经运行在一个基于 Hadoop 生态圈构建的大规模集群之上的话那么选用 Apache Flume 可能更加合适一些因为它天生就具备良好的可伸缩性和稳定性;反之如果是小型团队或者是初次接触此类技术的新手朋友不妨先尝试一下 Elastic Stack 方案即所谓的ELK套件因为它的学习曲线相对平缓而且社区活跃度也很高容易找到解决问题的办法。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值