深入解析logstash源码：从输入到输出-优快云博客

本文链接：https://blog.youkuaiyun.com/hail100/article/details/40373365

本文详细解读logstash源码，从bin/logstash脚本启动到agent运行，再到输入、过滤、输出模块的源码分析，全方位展示logstash的数据处理流程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

logstash源码分析

一、介绍

logstash 是一个应用程序日志、事件的传输、处理、管理和搜索的平台。你可以用它来统一对应用程序日志进行收集管理，提供 Web 接口用于查询和统计。

logstash提供file，redis，pipe，kafka，mq等多种输入输出方式的日志处理形式。

本文参考源码：https://github.com/elasticsearch/logstash

开源介绍：http://www.oschina.net/p/logstash

官方网站：http://www.logstash.net/

二、基本实现方式

logstash使用jruby实现，对input，filter，output各个项目开启线程进行数据处理。

首先根据配置格式将input，filter，output三种方式注册，后为每个模块开启线程pipe进行处理。

三、模块源码分析

bin /logstash脚本用于控制logstash安装依赖，启动模块等，其中调用了lib/logstash/runner.rb程序

case $1 in
      deps) install_deps ;;
      env) env "$@" ;;
      -*)
           if [ -z "$VENDORED_JRUBY" ] ; then
                exec "${RUBYCMD}" "${basedir}/lib/logstash/runner.rb" "agent" "$@"
           else
                exec "${JAVACMD}" $JAVA_OPTS "-jar" "$JRUBY_JAR" "${basedir}/lib/logstash/runner.rb" "agent" "$@"
            fi
        ;;
      *)
            if [ -z "$VENDORED_JRUBY" ] ; then
                 exec "${RUBYCMD}" "${basedir}/lib/logstash/runner.rb" "$@"
            else
                 exec "${JAVACMD}" $JAVA_OPTS "-jar" "$JRUBY_JAR" "${basedir}/lib/logstash/runner.rb" "$@"
             fi
           ;;
esac

lib / logstash / runner.rb可添加以下参数

agent - runs the logstash agent

version - emits version info about this logstash

web - runs the logstash web ui (called Kibana)

rspec - runs tests

对于启动agent代码如下：

 "agent" => lambda do
        require "logstash/agent"
        # Hack up a runner
        agent = LogStash::Agent.new($0)              #创建一个Agent实例
        begin
              agent.parse(args)                              #加载启动参数
        rescue Clamp::HelpWanted => e              #参数不对
              show_help(e.command)
              return 0
        rescue Clamp::UsageError => e               #参数过多
             # If 'too many arguments' then give the arguments to
             # the next command. Otherwise it's a real error.
             raise if e.message != "too many arguments"
             remaining = agent.remaining_arguments
        end
        return agent.execute                               #Agent开始执行采集
end

lib /logstash /agent.rb 用于agent信息获取，启动agent数据传输。

agent.execute函数启动agent流程如下：

1.检查config信息正确性，补充默认配置

2.创建pipeline对象并加载配置，后参看是否有推出信号并处理，然后启动pipiline

 begin
       pipeline = LogStash::Pipeline.new(@config_string)
rescue LoadError => e
       fail("Configuration problem.")
end
# Make SIGINT shutdown the pipeline.
trap_id = Stud::trap("INT") do
      @logger.warn(I18n.t("logstash.agent.interrupted"))
      pipeline.shutdown
end
Stud::trap("HUP") do
      @logger.info(I18n.t("logstash.agent.sighup"))
       configure_logging(log_file)
end
pipeline.configure("filter-workers", filter_workers)
# Stop now if we are only asking for a config test.
if config_test?
     report "Configuration OK"
     return
end

@logger.unsubscribe(stdout_logs) if show_startup_errors

# TODO(sissel): Get pipeline completion status.
pipeline.run
return 0

lib /logstash /pipeline.rb首先注册input，filter，output并创建线程，后启动input线程

 def run
       @started = true
       @input_threads = []

       start_inputs
       start_filters if filters?
       start_outputs

       @ready = true
       @logger.info("Pipeline started")
       wait_inputs
       if filters?
           shutdown_filters
           wait_filters
           flush_filters_to_output!(:final => true)
       end
       
      shutdown_outputs
       wait_outputs
       @logger.info("Pipeline shutdown complete.")
       # exit code
       return 0
end # def run

几个重要目录如下，用于添加这种方式的input，filter，output

lib / logstash / inputs

lib / logstash / filters

lib / logstash / outputs