| 版本 | 日期 | 备注 |
|---|---|---|
| 1.0 | 2021.10.19 | 文章首发 |
0. 背景
标题来源于InfluxDB对于它们的存储引擎诞生的背景介绍:
The workload of time series data is quite different from normal database workloads. There are a number of factors that conspire to make it very difficult to get it to scale and perform well:
- Billions of individual data points
- High write throughput
- High read throughput
- Large deletes to free up disk space
- Mostly an insert/append workload, very few updates
The first and most obvious problem is one of scale. In DevOps, for instance, you can collect hundreds of millions or billions of unique data points every day.
To prove out the numbers, let’s say we have 200 VMs or servers running, with each server collecting an average of 100 measurements every 10 seconds. Given there are 86,400 seconds in a day, a single measurement will gener

本文探讨时序数据库面临的挑战,如Prometheus和InfluxDB的数据存储问题,包括LSM Tree和BoltDB的优缺点。提出了解决方案,如采用LSM-Tree的变种Time Structured Merge Tree,利用WAL优化写入性能,以及通过数据保留策略和再采样来节省存储空间。
最低0.47元/天 解锁文章

被折叠的 条评论
为什么被折叠?



