日志收集Day006

原创已于 2025-01-25 16:26:10 修改 · 261 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#运维

于 2025-01-25 01:51:44 首次发布

1.logstash的多分支案例:

input { 
  beats {
    port => 8888
    type => "beats"
  }

  tcp {
    port => 9999
    type => "tcp"
  }

  http {
    type => "http"
  }
} 


filter {
  if [type] == "beats" {
      grok {
         match => { "message" => "%{HTTPD_COMBINEDLOG}" }
         remove_field => [ "agent","log","input","host","ecs","tags" ]
      }

      geoip {
         source => "clientip"
         add_field => {"custom-type" => "thiis-beats"}
      }
      
      date {
          match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      }
  } else if [type] == "tcp" {
     grok {
         # 指定加载pattern匹配模式的目录，可以是相对路径，也可以是绝对路径
         patterns_dir => ["/root/config/06-patterns"]
         # 基于指定字段进行匹配
         match => { "message" => "%{JIYU:jiyu}%{YEAR:year}"}
         add_field => {"custom-type" => "this-tcp"}
    }
  }else {
    mutate {
       add_field => { 
           "custom-type" => "thisis-http"
       } 
    }
  }

}

output { 
 stdout {} 
  
 #  elasticsearch {
 #    hosts => ["http://localhost:9200"]
 #    index => "AA"
 #  }
}

2.output的多分支案例

将上个例子output如下修改，即可实现多分支输出

output {
  if [type] == "beats" {
    elasticsearch {
      hosts => ["http://localhost:9200"]
      index => "aa"
    }
  } else if [type] == "tcp" {
    elasticsearch {
      hosts => ["http://localhost:9200"]
      index => "bb"
    }
  } else {
    elasticsearch {
      hosts => ["http://localhost:9200"]
      index => "cc"
    }
  }
}

3.filebeat和logstash多实例

运行多个实例时，需要指定--path.data=一个不存在的文件,但是这种方式会消耗更多资源，最好的办法是使用多个pipeline。

4.pipeline

1.修改pipline的配置文件：vim /app/softwares/logstash-7.17.5/config/pipelines.yml

增加如下配置：

- pipeline.id: test-pipeline-beats
  path.config: "/root/config07-beats-pipeline-es.conf"
- pipeline.id: test-pipeline-tcp
  path.config: "/root/config07-tcp-pipeline-es.conf"
- pipeline.id: test-pipeline-http
  path.config: "/root/config07-http-pipeline-es.conf"

2.将原先多分支案例拆分成多个conf文件

3.启动logstash实例 logstash

5.logstash的useragent过滤器

logstash配置

useragent模块要求数据源必须为json格式

input { 
  beats {
    port => 8888
  }
} 


filter {
   mutate {
      remove_field => [ "agent","log","input","host","ecs","tags" ]
   }

  geoip {
     source => "clientip"
  }

  date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      timezone => "Asia/Shanghai"
  }

  # 用于分析客户端设备类型的插件
  useragent {
    # 指定基于哪个字段分析设备
    source => "http_user_agent"
    # 指定将解析的数据放在哪个字段，若不指定，则默认放在顶级字段中
    target => "lxc-log"
  }

}

output { 
# stdout {} 
  
 elasticsearch {
   hosts => ["http://localhost:9200"]
   index => "lxc-logstash-nginx-useragent"
 }
}

2.filebeat配置

filebeat.inputs:
- type: log
  paths:
  - /var/log/nginx/access.log
  json:
    keys_under_root: true
    add_error_key: true
    overwrite_keys: true

# 将数据输出到logstash中
output.logstash:
   # 指定logstash的主机和端口
  hosts: ["10.0.0.101:8888"]

3.采集之后便可以在kibana创建仪表板(可以先添加到Visualize 库，再添加到仪表板)

4.源日志参考：

{"@timestamp":"2023-04-06T16:17:43+08:00","host":"10.0.0.103","clientip":"110.110.110.110","SendBytes":615,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"10.0.0.103","uri":"/index.html","domain":"10.0.0.103","xff":"-","referer":"-","tcp_xff":"-","http_user_agent":"curl/7.29.0","status":"200"}

6.mutate插件补充（filter的一个插件）

#第一个和第二个对message字段进行拆分，添加到顶级字段 
mutate {
      # 将message字段使用"|"进行切分
      split => { "message" => "|" }
   }
   mutate {
     add_field => {
        userid => "%{[message][1]}"
        verb => "%{[message][2]}"
        svip => "%{[message][3]}"
        price => "%{[message][4]}"
     }
   }
  #重命名字段
   mutate {
     rename => {
        "verb" => "action"
     }
   }
 #设置数据类型（详见官网）
   mutate {
     convert => {
       "userid" => "integer"
       "svip" => "boolean"
       "price" => "float"
     }
   }

源日志参考：

INFO 2025-01-25 16:24:39 [com.generate_log] - DAU|7014|浏览页面|1|28542.78 
INFO 2025-01-25 16:24:42 [com.generate_log] - DAU|682|评论商品|1|24244.89 
INFO 2025-01-25 16:24:43 [com.generate_log] - DAU|1027|使用优惠券|0|15901.5