Elasticsearch：使用 runtime fields 探索你的数据

最新推荐文章于 2025-12-05 14:50:47 发布

原创

最新推荐文章于 2025-12-05 14:50:47 发布 · 182 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#elasticsearch #搜索引擎 #apache

Elasticsearch

考虑要提取字段的大量日志数据。为数据建立索引非常耗时，并且会占用大量磁盘空间，而你只想探索数据结构而无需预先提交 schema。

你知道你的日志数据包含你要提取的特定字段。在这种情况下，我们要关注 @timestamp 和消息字段。通过使用运行时字段（runtime fields），你可以定义脚本来计算这些字段在搜索时的值。

定义索引字段作为起点

你可以从一个简单的示例开始，将 @timestamp 和 message 字段作为索引字段添加到 my-index-000001 映射中。为了保持灵活性，使用 wildcard 作为消息的字段类型：

1.  PUT /my-index-000001/
2.  {
3.    "mappings": {
4.      "properties": {
5.        "@timestamp": {
6.          "format": "strict_date_optional_time||epoch_second",
7.          "type": "date"
8.        },
9.        "message": {
10.          "type": "wildcard"
11.        }
12.      }
13.    }
14.  }

在上面，我们有意使用 wildcard 字段来定义 message。这样它非常节省存储空间，并且会提高写入文档的速度。

摄取一些数据

映射完要检索的字段后，将日志数据中的几条记录索引到 Elasticsearch 中。以下请求使用 _bulk API 将原始日志数据索引到 my-index-000001。你可以使用一个小样本来试验运行时字段，而不是索引所有日志数据。

最终文档不是有效的 Apache 日志格式，但我们可以在脚本中考虑到这种情况。

1.  POST /my-index-000001/_bulk?refresh
2.  {"index":{}}
3.  {"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
4.  {"index":{}}
5.  {"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
6.  {"index":{}}
7.  {"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
8.  {"index":{}}
9.  {"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}
10.  {"index":{}}
11.  {"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}
12.  {"index":{}}
13.  {"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
14.  {"index":{}}
15.  {"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}

`![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)

此时，你可以查看 Elasticsearch 如何存储你的原始数据。

GET my-index-000001

最低0.47元/天解锁文章