cassandra clustering key 的查询原理

最新推荐文章于 2025-07-16 08:37:57 发布

原创最新推荐文章于 2025-07-16 08:37:57 发布 · 1.8k 阅读

0 ·

CC 4.0 BY-SA版权

cassandra 同时被 2 个专栏收录

8 篇文章

订阅专栏

CQL

1 篇文章

订阅专栏

本文深入探讨了数据存储方式及其对查询效率的影响，重点分析了如何通过有序数据存储提高查询速度，以及不同集群键对数据访问的影响。通过具体实例展示了如何在特定条件下优化查询性能。

Suppose your clustering keys are

k1 t1, k2 t2, ..., kn tn

where ki is the ith key name and ti is the ith key type. Then the order data is stored in is lexicographic ordering where each dimension is compared using the comparator for that type.

So (a1, a2, ..., an) < (b1, b2, ..., bn) if a1 < b1 using t1 comparator, or a1=b1 and a2 < b2 using t2 comparator, or (a1=b1 and a2=b2) and a3 < b3 using t3 comparator, etc..

This means that it is efficient to find all rows with a certain k1=a, since the data is stored together. But it is inefficient to find all rows with ki=x for i > 1. In fact, such a query isn't allowed - the only clustering key constraints that are allowed specify zero or more clustering keys, starting from the first with none missing.

For example, consider the schema

create table clustering (
    x text,
    k1 text,
    k2 int,
    k3 timestamp,
    y text,
    primary key (x, k1, k2, k3)
);

If you did the following inserts:

insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 1, '2013-09-10 14:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 2, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 14:00+0000', '1');

then they are stored in this order on disk (the order select * from clustering where x = 'x'returns):

 x | k1 | k2 | k3                       | y
---+----+----+--------------------------+---
 x |  a |  1 | 2013-09-10 14:00:00+0000 | 1
 x |  a |  2 | 2013-09-10 13:00:00+0000 | 1
 x |  b |  1 | 2013-09-10 13:00:00+0000 | 1
 x |  b |  1 | 2013-09-10 14:00:00+0000 | 1

k1 ordering dominates, then k2, then k3.

primary key决定了在哪个node上，cluster key 决定的是存储的顺序，而且是按照cluster key1, cluster key2, cluster key3 的顺序来存储的，所以上例子中，：

select * from clustering where x='x' and k1='a', 很容易查，但是select * from clustering where x='x' and k2='b',这个时候得先把k1=＊查出来，然后再找k2='b'的，所以没有意义了。