Elastic Stack梳理：Logstash Filter 插件深度解析与工程实践指南

原创于 2025-12-08 19:30:00 发布 · 370 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#Elastic Search #搜索引擎

ES-Private 专栏收录该内容

20 篇文章

订阅专栏

Logstash Filter 插件深度解析与性能优化实践

核心要点：

Dissect 插件的高效解析机制
Mutate 插件的字段操作全集
JSON 插件结构化处理方案
GeoIP/Ruby 插件的扩展能力
Output 插件与 Elasticsearch 深度集成

系统架构设计

Dissect 插件：高效日志解析方案

1 ) 核心原理

区别于 Grok 的正则匹配，Dissect 基于分隔符定位实现非结构化日志到结构化数据的转换，其优势在于：

性能提升 3 倍（官方基准测试），因避免正则回溯消耗

语法简洁：由 %{字段} 和分隔符构成，例如解析 syslog：

Apr 26 10:01:23 localhost systemd[1]: Started service

对应配置：

filter {
  dissect {
    mapping => { "message" => "%{ts} %{+ts} %{+ts} %{src} %{prog}[%{pid}]: %{msg}" }
  }
}

→ 输出：ts: "Apr 26 10:01:23", src: "localhost", prog: "systemd", pid: "1"

字段操作符说明：
符号作用示例
%{字段} 基础字段提取 %{ts} → Apr 26
%{+字段} 值追加 %{+ts} 合并时间戳
%{?字段} 间接字段命名 (Key-Value) 解析 a=1&b=2
%{} 忽略匹配值占位不存储

符号	作用	示例
`%{字段}`	基础字段提取	`%{ts}` → `Apr 26`
`%{+字段}`	值追加	`%{+ts}` 合并时间戳
`%{?字段}`	间接字段命名 (Key-Value)	解析 `a=1&b=2`
`%{}`	忽略匹配值	占位不存储

基础语法结构：

%{timestamp} %{+timestamp} %{+timestamp} %{source} %{program}[%{pid}]: %{message}

%{field}：字段捕获声明
%{+field}：字段值追加（多段合并）
分隔符：字段间的固定字符（如空格、冒号等）

2 ) 高级特性

2.1 字段顺序调整

通过 %{?order->顺序} 重排数据

# 输入："two three one go"
dissect { mapping => { "message" => "%{?a} %{?b} %{?c} %{d}" } }
add_field => { "order" => "%{c} %{a} %{b} %{d}" }

→ 输出：order: "one two three go"

动态字段命名：解析 a=1&b=2 类数据

dissect { mapping => { "message" => "%{?key}=%{&key}" } }

→ 输出：a: "1", b: "2"

2.2 空值处理
缺失字段自动置空（如 address2: ""）

输入：John,,Shanghai
%{name},%{address},%{city}
输出：{"name":"John","address":null,"city":"Shanghai"}

2.3 类型转换
通过 convert_datatype 转换字段类型

dissect {
  mapping => { "message" => "%{date} %{level} %{msg}" }
  convert_datatype => { "pid" => "int" }
}

2.4 字段重排序

# 字段重排序：将 "two three one" 调整为 "one two three"
filter {
  dissect {
    mapping => { 
      "message" => "%{?order} %{&order}" 
    }
  }
}

输出效果：

{ "order": ["one", "two", "three"] }

Mutate 插件：数据清洗万能工具

1 ）核心操作类型

操作	功能	示例配置	输入→输出
`convert`	字段类型转换	`convert => { "count" => "integer" }`	`"123"` → `123` (int)
`gsub`	正则替换	`gsub => [ "path", "\\/", "_" ]`	`/var/log` → `_var_log`
`split`	字符串切割为数组	`split => { "tags" => "," }`	`"a,b,c"` → `["a","b","c"]`
`join`	数组合并为字符串	`join => { "new_field" => "," }`	`["a","b"]` → `"a,b"`
`merge`	合并字段（支持数组/字符串）	`merge => { "dest" => "src" }`	`dest: [1]` + `src:2` → `[1,2]`
`rename`	重命名字段	`rename => { "old" => "new" }`	删除 `old`，新增 `new`
`update`	仅当字段存在时更新	`update => { "exist_field" => "new_val" }`	字段不存在时跳过
`replace`	强制替换/新增字段	`replace => { "any_field" => "value" }`	无条件写入
`remove`	删除字段	`remove_field => [ "tmp_field" ]`	清理冗余数据

2 ）工程场景示例

2.1 完整字段处理流水线

filter {
  mutate {
    split => { "message" => "|" }     # 切割为数组 
    convert => { "code" => "integer" } # 转换类型 
    gsub => [                         # 清理特殊字符 
      "url", "[?#]", "_",
      "user", "\\W", ""
    ]
    rename => { "user" => "username" } # 标准化字段名 
    remove_field => ["debug_info"]      # 删除调试字段 
  }
}

2.2 数据类型转换

mutate {
  convert => { 
    "response_time" => "float"
    "status_code" => "integer"
    "is_active" => "boolean"
  }
}

2.3 字符串处理技术
正则替换：

gsub => [
  # 路径规范化：/var/log/nginx => var_log_nginx
  "path", "/", "_",
  
  # URL参数过滤：user?id=123#section => user.id.section
  "url", "[?#&]", "."
]

字符串拆分与合并：

CSV数据处理 
split => { "csv_data" => "," }
 
数组元素合并
join => { "components" => "|" }

2.4 字段元数据操作

# 字段重命名 
rename => { "old_field" => "new_field" }
 
# 数组合并 
merge => { "dest_array" => "source_array" }
 
# 字段更新策略
update => { "existing_field" => "new_value" }   # 仅更新存在字段
replace => { "potential_field" => "default" }   # 可创建新字段
 
# 敏感信息移除 
remove_field => [ "credit_card", "auth_token" ]

JSON 插件：结构化数据提取利器

应用场景

当日志包含 JSON 字符串时（如 {"user": "Alice", "action": "login"}），需解耦为独立字段

配置策略

filter {
  json {
    source => "message"   # 原始JSON字段 
    target => "parsed"    # 解析后存储路径 
    skip_on_invalid => true # 忽略格式错误 
  }
}

输出结构对比

配置方式	输入示例	输出结构
无target	`{"user":"alice"}`	`{"user":"alice"}`
带target	`{"user":"alice"}`	`{"parsed":{"user":"alice"}}`

未指定 target：解析字段置于根层级
```
{ "name": "test", "value": 1 } 
```

指定 target：嵌套存储

{ "parsed": { "name": "test", "value": 1 } }

重要提示：HTTP Input 需禁用 JSON 解析（避免冲突）

input {
  http { 
    codec => "plain"  # 禁用默认 JSON 解析器 
  }
}

关键提示：当使用HTTP Input插件时，需避免设置Content-Type: application/json头部，否则会自动触发JSON解析导致插件失效。

GeoIP 与 Ruby 插件：高级数据处理

1 ) GeoIP：地理信息增强

filter {
  geoip {
    source => "client_ip"  # IP地址字段
    target => "geo"        # 地理信息存储位置
  }
}

输出结果示例：

"geo": {
  "city_name": "Shanghai",
  "country_code": "CN",
  "location": { "lon": 121.47, "lat": 31.23 }
}

2 ) Ruby：自定义逻辑扩展

filter {
  ruby {
    code => '
      event.set("message_size", event.get("message").size)
    '
  }
}

→ 新增 message_size 字段（值=消息长度）

或

filter {
  ruby {
    code => "
      size = event.get('message').bytesize 
      event.set('message_size', size) # 计算日志大小 
    "
  }
}

再来一个更复杂点儿的

ruby {
  code => '
    # 计算消息体哈希值
    require "digest"
    event.set("message_hash", Digest::SHA256.hexdigest(event.get("message")))
    
    # 复杂业务逻辑处理 
    if event.get("[geo][country_code]") == "CN"
      event.set("timezone", "Asia/Shanghai")
    end
  '
}

Output 插件：数据路由与存储

核心插件对比

插件	使用场景	关键配置示例
`stdout`	调试开发	`codec => rubydebug`
`file`	原始日志归档	`path => "/logs/%{+YYYY-MM-dd}.log"`
`elasticsearch`	生产环境存储	见下方详细配置

Elasticsearch Output 最佳实践

output {
  elasticsearch {
    hosts => ["http://data-node1:9200", "http://data-node2:9200"] # 仅连接 data node 
    index => "logs-%{+YYYY.MM.dd}"      # 按日期分索引 
    document_id => "%{fingerprint}"     # 自定义文档ID (防重复)
    action => "update"                  # 存在则更新
    doc_as_upsert => true               # 不存在则插入 
    template => "logstash-template.json" 
    template_name => "logstash_custom"   # 自定义映射模板 
  }
}

高级配置

output {
  elasticsearch {
    # 集群连接配置
    hosts => ["data-node1:9200", "data-node2:9200"]
    sniffing => true
    
    # 索引管理策略 
    index => "app-logs-%{service}-%{+YYYY.MM.dd}"
    template => "/etc/logstash/templates/logs-template.json"
    template_overwrite => true
    
    # 文档写入策略 
    action => "update"
    document_id => "%{fingerprint}"
    doc_as_upsert => true
    
    # 安全认证 
    user => "logstash_writer"
    password => "${ES_PASSWORD}"
    ssl => true
    cacert => "/path/to/ca.pem"
  }
}

Output 插件与 Elasticsearch 集成

1 ) 核心配置参数

output {
  elasticsearch {
    hosts => ["es-node1:9200", "es-node2:9200"] # 仅配置 Data 节点
    index => "logs-%{+YYYY.MM.dd}"              # 时间滚动索引 
    document_id => "%{fingerprint}"             # 自定义文档ID 
    template => "/etc/logstash/template.json"  # 索引模板 
    action => "update"                          # 更新模式 
    doc_as_upsert => true                       # 不存在时创建 
  }
}

关键优化项：

禁用 Master 节点连接：避免元数据操作冲击
Bulk 大小调整：pipeline.batch.size => 500
重试机制：retry_on_conflict => 3

2 ) 索引模板示例

// template.json 
{
  "template": "logs-*",
  "settings": {
    "number_of_shards": 3,
    "refresh_interval": "30s"
  },
  "mappings": {
    "properties": {
      "geo.location": { "type": "geo_point" },
      "timestamp": { "type": "date" }
    }
  }
}

工程示例：1

1 ) 方案 1：直接写入 Elasticsearch

// nestjs.service.ts
import { Injectable } from '@nestjs/common';
import { Client } from '@elastic/elasticsearch';
 
@Injectable()
export class LogService {
  private readonly esClient: Client;
 
  constructor() {
    this.esClient = new Client({ 
      nodes: ['http://es-node1:9200'],
      auth: { username: 'elastic', password: 'changeme' }
    });
  }
 
  async logToES(data: object) {
    await this.esClient.index({
      index: 'nestjs-logs',
      body: { ...data, '@timestamp': new Date() }
    });
  }
}

2 ) 通过 Logstash 管道处理

// 发送日志到 Logstash HTTP Input 
import { HttpService } from '@nestjs/axios';
 
async sendToLogstash(log: any) {
  await this.httpService.post(
    'http://logstash:8080', 
    log,
    { headers: { 'Content-Type': 'application/json' } }
  ).toPromise();
}

Logstash 配置 (pipelines.conf):

input { http { port => 8080 } }
filter { 
  json { source => "message" } 
  mutate { remove_field => ["message"] } 
}
output { elasticsearch { ... } }

3 ) Filebeat + Logstash 组合

filebeat.yml 配置
filebeat.inputs:
- type: filestream 
  paths: ["/var/log/nestjs/*.log"]
output.logstash:
  hosts: ["logstash:5044"]

Logstash 管道：

input { beats { port => 5044 } }
filter { 
  dissect { mapping => { "message" => "[%{level}] %{timestamp} %{message}" } } 
}
output { elasticsearch { ... } }

工程示例：2

1 ）基础数据写入

// nestjs-logger.service.ts 
import { Injectable } from '@nestjs/common';
import { Client } from '@elastic/elasticsearch';
 
@Injectable()
export class LoggerService {
  private esClient: Client;
 
  constructor() {
    this.esClient = new Client({ node: 'http://es-node:9200' });
  }
 
  async logEvent(data: object) {
    await this.esClient.index({
      index: 'app-logs',
      body: { 
        timestamp: new Date().toISOString(),
        ...data 
      }
    });
  }
}

2 ) 批量写入 + 错误重试

// elastic-bulk.service.ts 
import { Client } from '@elastic/elasticsearch';
 
export class ElasticBulkWriter {
  private bulkQueue: object[] = [];
  
  constructor(private esClient: Client) {}
 
  async addToQueue(log: object) {
    this.bulkQueue.push({ index: { _index: 'logs' } }, log);
    if (this.bulkQueue.length >= 100) await this.flush();
  }
 
  async flush() {
    try {
      await this.esClient.bulk({ body: this.bulkQueue });
    } catch (e) {
      if (e.meta?.body?.error?.type === 'es_rejected_execution_exception') {
        setTimeout(() => this.flush(), 3000); // 指数退避重试
      }
    }
  }
}

3 ) 索引生命周期管理 (ILM)

elasticsearch.yml 配置 
ilm:
  policies:
    logs_policy:
      phases:
        hot:
          min_age: 0ms 
          actions:
            rollover:
              max_size: "50GB"
        delete:
          min_age: "30d"
          actions: { delete: {} }

工程示例：3

1 ）基础日志采集管道

Logstash配置 (pipeline.conf):

input {
  http {
    port => 8080
    codec => "json"
  }
}
 
filter {
  mutate {
    add_field => { "received_at" => "%{@timestamp}" }
    remove_field => [ "headers" ]
  }
  
  date {
    match => [ "timestamp", "ISO8601" ]
    target => "@timestamp"
  }
}
 
output {
  elasticsearch {
    hosts => ["es-cluster:9200"]
    index => "app-logs-%{+YYYY.MM.dd}"
  }
}

NestJS日志服务 (log.service.ts):

import { Injectable } from '@nestjs/common';
import { HttpService } from '@nestjs/axios';
import { ConfigService } from '@nestjs/config';
 
@Injectable()
export class LogService {
  constructor(
    private readonly http: HttpService,
    private readonly config: ConfigService 
  ) {}
 
  async sendLog(payload: Record<string, any>) {
    const logstashUrl = this.config.get('LOGSTASH_URL');
    const logEntry = {
      ...payload,
      service: 'nestjs-gateway',
      environment: this.config.get('NODE_ENV'),
      timestamp: new Date().toISOString()
    };
 
    await this.http.post(logstashUrl, logEntry).toPromise();
  }
}

2 ）增强型日志处理框架

高级日志处理管道：

filter {
  # 结构化字段提取 
  dissect {
    mapping => {
      "message" => "%{service} %{level} %{trace_id} %{@timestamp} %{payload}"
    }
  }
  
  # IP地理位置增强 
  geoip {
    source => "client_ip"
    target => "geo"
  }
  
  # 敏感数据脱敏
  mutate {
    gsub => [
      "payload.email", ".+@", "[REDACTED]@",
      "payload.phone", "\d{4}$", ""
    ]
  }
  
  # 错误堆栈解析 
  if [level] == "ERROR" {
    grok {
      match => { "stack_trace" => "(?m)%{JAVASTACKTRACEPART}" }
    }
  }
}

NestJS拦截器实现 (log.interceptor.ts):

import { Injectable, NestInterceptor, ExecutionContext, CallHandler } from '@nestjs/common';
import { Observable } from 'rxjs';
import { tap } from 'rxjs/operators';
import { LogService } from './log.service';
 
@Injectable()
export class LoggingInterceptor implements NestInterceptor {
  constructor(private logService: LogService) {}
 
  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const request = context.switchToHttp().getRequest();
    const startTime = Date.now();
 
    return next.handle().pipe(
      tap({
        next: (data) => this.logSuccess(request, data, startTime),
        error: (err) => this.logError(request, err, startTime)
      })
    );
  }
 
  private logSuccess(req, data, startTime) {
    const duration = Date.now() - startTime;
    this.logService.sendLog({
      type: 'REQUEST',
      method: req.method,
      path: req.url,
      status: req.res.statusCode,
      clientIp: req.ip,
      duration: duration,
      responseSize: JSON.stringify(data).length
    });
  }
 
  private logError(req, error, startTime) {
    const duration = Date.now() - startTime;
    this.logService.sendLog({
      type: 'ERROR',
      method: req.method,
      path: req.url,
      status: error.status || 500,
      error: error.message,
      stack: error.stack,
      duration: duration 
    });
  }
}

3 ）Elasticsearch索引生命周期管理

索引模板 (logs-template.json):

{
  "index_patterns": ["app-logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs_policy",
      "index.codec": "best_compression"
    },
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "@timestamp": { "type": "date" },
        "service": { "type": "keyword" },
        "level": { "type": "keyword" },
        "geo": { "type": "geo_point" },
        "duration": { "type": "long" },
        "trace_id": { "type": "keyword" },
        "message": { 
          "type": "text",
          "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
        },
        "stack_trace": { "type": "text", "index": false }
      }
    }
  }
}

索引生命周期策略 (logs_policy):

PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_size": "50GB", "max_age": "7d" }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "forcemerge": { "max_num_segments": 1 }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": { "delete": {} }
      }
    }
  }
}

4 ) Elasticsearch运维保障体系

性能优化配置

logstash.yml
pipeline:
  workers: 8
  batch:
    size: 125 
    delay: 50 
queue:
  type: persisted
  max_bytes: 4gb

安全加固策略

Elasticsearch输出安全配置
elasticsearch {
  hosts => ["https://secured-cluster:9200"]
  user => "${LOGSTASH_USER}"
  password => "${LOGSTASH_PWD}"
  ssl => true 
  ssl_certificate_verification => true 
  truststore => "/path/to/truststore.jks"
  truststore_password => "${TRUSTSTORE_PWD}"
}

监控与告警方案

Metricbeat监控配置
metricbeat.modules:
- module: logstash 
  metricsets: ["node"]
  period: 10s 
  hosts: ["localhost:9600"]
 
告警规则示例 
ELASTICSEARCH_LOGSTASH_QUEUE_SIZE:
  query: |
    max:logstash.node.pipelines.queue.queue_size{*} by {cluster_uuid, node_id} > 1000 
  severity: WARNING

关键配置要点与文档指引

1 ) Elasticsearch 连接优化

仅连接 data 节点（避开 master 节点）
启用 HTTP 压缩：http_compression => true

批量写入参数：

flush_size => 500     # 每批次文档数
idle_flush_time => 5  # 空闲刷新间隔(秒)

2 ) 模板管理
预定义索引映射 (logstash-template.json):

{
  "index_patterns": ["logs-*"],
  "settings": { 
    "number_of_shards": 3,
    "refresh_interval": "30s" 
  },
  "mappings": {
    "properties": {
      "geo.location": { "type": "geo_point" },
      "@timestamp": { "type": "date" }
    }
  }
}

3 ) Logstash 管道优化

pipelines.yml 
- pipeline.id: main 
  pipeline.workers: 4               # CPU 核数 
  queue.type: persisted             # 崩溃时防数据丢失
  path.config: "/etc/logstash/conf.d/*.conf"

4 ) Elasticsearch 安全配置

elasticsearch.yml 
xpack.security.enabled: true 
xpack.security.authc.api_key.enabled: true

5 ) Kibana 可视化关联

// index_pattern.json 
{
  "title": "logs-*",
  "timeFieldName": "timestamp",
  "fields": [
    { "name": "geo.location", "type": "geo_point" }
  ]
}

6 ) 官方文档导航

资源类型	访问路径
Plugin 列表	https://www.elastic.co/guide/en/logstash/current/input-plugins.html
Dissect 语法细节	https://www.elastic.co/guide/en/logstash/current/plugins-filters-dissect.html
ES Output 参数大全	https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html