Skywalking 存储数据定时清理任务失效现象
现网环境deploy Skywalking 后台oap server,采用elasticsearch 6为存储,现象为长达两个星期的数据
始终保存在ES中,并没有按照预想的根据配置文件中默认的recordDataTTL和metricsDataTTL设置的3天
和7天的有效期来进行清理。
| - | - | recordDataTTL | The lifecycle of record data. Record data includes traces, top n sampled records, and logs. Unit is day. Minimal value is 2. | SW_CORE_RECORD_DATA_TTL | 3 |
|---|---|---|---|---|---|
| - | - | metricsDataTTL | The lifecycle of metrics data, including the metadata. Unit is day. Recommend metricsDataTTL >= recordDataTTL. Minimal value is 2. | SW_CORE_METRICS_DATA_TTL | 7 |
查看配置文件中有关清理的配置,默认应该是启动数据清理DataKeeperExecutor, 并且是每5分钟运行一次。
| - | - | enableDataKeeperExecutor | Controller of TTL scheduler. Once disabled, TTL wouldn’t work. | SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR | true |
|---|---|---|---|---|---|
| - | - | dataKeeperExecutePeriod | The execution period of TTL scheduler, unit is minute. Execution doesn’t mean deleting data. The storage provider could override this, such as ElasticSearch storage. | SW_CORE_DATA_KEEPER_EXECUTE_PERIOD | 5 |
查看skywalking oap server pod的日志,kubectl logs -f 跟踪查看,发现每五分钟,有这样一条记录
2021-04-26 06:05:51,082 - org.apache.skywalking.oap.server.core.storage.ttl.DataTTLKeeperTimer -325937 [pool-10-thread-1] INFO [] - The selected first getAddress is 100.67.187.229_11800. Skip.
这里可以看到DataTTLKeeperTimer 这个类,应该就是定时做数据清理的,转向Skywalking 8.4.0 源码分析原因。

本文探讨了Skywalking OAP服务器中数据清理任务失效的问题,详细分析了源码,发现是由于uid匹配问题导致单节点OAP在清理时跳过。通过自定义打包Skywalking OAP镜像并修改配置,最终解决了数据清理机制失效的问题。
最低0.47元/天 解锁文章
349

被折叠的 条评论
为什么被折叠?



