HBase (1): OpenTSDB Table Design

本文介绍OpenTSDB——一款分布式时间序列数据库的设计理念及其如何利用HBase存储大量时间序列数据。OpenTSDB能够高效地收集、存储并提供数十亿计的数据点,适用于现代监控需求。文章详细解析了OpenTSDB的表格设计方式,特别是如何优化HBase表结构以应对大数据处理挑战。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Leave a reply

Why we need to learn OpenTSDB? Is it good study case for us to know how to design HBase Table? For me, I would totally say yes. There are many good optimizations which already are applied to OpenTSDB, this open source project. So this post will only say how does OpenTSDB design the HBase table, not focus on how to use OpenTSDB or how to implement OpenTSDB  to monitor server. Maybe in the future, I will write down this part.

So first, Let’s simply know some basic concepts in OpenTSDB.

What is OpenTSDB?

It is the distributed, scalable, time series database which is for modern monitor needs. It can collect, store and serve billion data points with no less of precision, can be used with Tcollector. Here are two key points, one is time series, the other is billion data. So timestamp is important point in OpenTSDB, and there are many data points which OpenTSDB needs to deal with. (That’s the main reason we need to learn OpenTSDB’s design; we are also facing big data and time is also significant field for the data)

Even though OpenTSDB is open source project, it is also used many other big companies, including Yahoo, Ebay, Pinterest, and so on.

Some Concepts

  • data points: (time, value)
  • metrics: proc.loadavg.cpu
  • tags: hosts=haimeili, ip=127.0.0.1
  • metric + tags = time series

There are two tables which OpenTSDB use to store data, one is tsdb, the other is tsdb-uid. Currently, it already have two additional tables, named tsdb-meta, tsdb-tree.(new in OpenTSDB 2.0)

tsdb-uid

This table is to map uid to name or map name to uid. There are only three kinds of qualifiers: metric, tagk and tagv. We need to remember that this is two ways, one is from uid to name, the other is from name to uid. Here is the example,

Screen Shot 2014-11-05 at 10.41.33 AM

tsdb

tsdb is the main table to store data point. Its rowkey is a concatenation of uids and time.

  • This is rowkey format: <metric uid><timestamp><tagk1><tagv1><tagk2><tagv2>….
  • Timestamp normalized on 1 hour boundaries
  • All data points for an hour are stored in one row
  • There are two qualifer formats, one is 2 bytes, the other is 4 bytes. For 2 bytes, it looks like this: <12 bits><4bits>. The first 12 bits is to store min-second information. the 4 bits is a flag, first 1 bit is to tell the value is integer or double, the rest three bits is to tell the length of the value from 0 to 8 bytes. e.g. “000” means 1 byte value, “010” means 2 bytes value, etc. For 4 bytes, it looks like this: <4 bits><22 bits><2 bits><4 bits>. The first 4 bits is “0000” or “1111”. The 22 bits is the min-second information. The last 4 bits is flag which is the same with above.

Here is one example:

1297574486 = 2011-02-13 13:21:26    
MWeP = 01001101 01010111 01100101 01010000 = 1297573200 = 2011-02-13 13:00:00 (only select hours and cut down mins which will be stored in qualifier)
PK = 01010000 01101011 = 1286 (1286 seconds = 21 mins 26 seconds)
1297573200+1286=1297574486

Screen Shot 2014-11-05 at 10.59.54 AM

Summary

When you design table for big table, you need to consider to use concatenation method to save space. If you have time-based data, you need to think about the position to store timestamp, and whether you want to store the data for per second or per minute. Also if your data is not good format, or too long, or you have the list of data, you might need to map data to a uid to save space.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值