Cassandra中分区键，复合键和聚类键之间的区别？

最新推荐文章于 2024-07-09 08:21:00 发布

xfxf996

最新推荐文章于 2024-07-09 08:21:00 发布

阅读量1.7k

点赞数 1

CC 4.0 BY-SA版权

文章标签： database cassandra cql

原文链接：https://oldbug.net/q/1ggY8/Difference-between-partition-key-composite-key-and-clustering-key-in-Cassandra

Cassandra的主键分为简单和复合两种，其中分区键决定数据分布，负责跨节点的数据分发；聚类键则负责在分区内的数据排序。在复合主键中，第一个部分是分区键，后续部分是聚类键。查询时，至少需要提供所有分区键列，可选地按顺序添加聚类键。理解这些概念有助于更好地操作Cassandra数据库。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文翻译自：Difference between partition key, composite key and clustering key in Cassandra?

I have been reading articles around the net to understand the differences between the following key types. 我一直在阅读网络上的文章，以了解以下key类型之间的差异。 But it just seems hard for me to grasp. 但这对我来说似乎很难掌握。 Examples will definitely help make understanding better. 实例肯定有助于更好地理解。

primary key,
partition key, 
composite key 
clustering key

#1楼

参考：https://stackoom.com/question/1ggY8/Cassandra中分区键-复合键和聚类键之间的区别

#2楼

There is a lot of confusion around this, I will try to make it as simple as possible. 围绕这个有很多困惑，我会尽量让它变得简单。

The primary key is a general concept to indicate one or more columns used to retrieve data from a Table. 主键是一般概念，用于指示用于从表中检索数据的一个或多个列。

The primary key may be SIMPLE and even declared inline: 主键可以是SIMPLE ，甚至可以内联声明：

 create table stackoverflow_simple (
      key text PRIMARY KEY,
      data text      
  );

That means that it is made by a single column. 这意味着它是由一个列组成的。

But the primary key can also be COMPOSITE (aka COMPOUND ), generated from more columns. 但主键也可以是COMPOSITE （又名COMPOUND ），由更多列生成。

 create table stackoverflow_composite (
      key_part_one text,
      key_part_two int,
      data text,
      PRIMARY KEY(key_part_one, key_part_two)      
  );

In a situation of COMPOSITE primary key, the "first part" of the key is called PARTITION KEY (in this example key_part_one is the partition key) and the second part of the key is the CLUSTERING KEY (in this example key_part_two ) 在COMPOSITE主键的情况下，键的“第一部分”称为PARTITION KEY （在此示例中， key_part_one是分区键），键的第二部分是CLUSTERING KEY （在此示例中为key_part_two ）

Please note that the both partition and clustering key can be made by more columns , here's how: 请注意，分区和群集键都可以由更多列创建 ，具体方法如下：

 create table stackoverflow_multiple (
      k_part_one text,
      k_part_two int,
      k_clust_one text,
      k_clust_two int,
      k_clust_three uuid,
      data text,
      PRIMARY KEY((k_part_one, k_part_two), k_clust_one, k_clust_two, k_clust_three)      
  );

Behind these names ... 这些名字背后......

The Partition Key is responsible for data distribution across your nodes. 分区键负责跨节点的数据分发。
The Clustering Key is responsible for data sorting within the partition. Clustering Key负责分区内的数据排序。
The Primary Key is equivalent to the Partition Key in a single-field-key table (ie Simple ). 主键等效于单字段键表中的分区键 （即简单）。
The Composite/Compound Key is just any multiple-column key 复合/复合键只是任何多列键

Further usage information: DATASTAX DOCUMENTATION 进一步的使用信息： DATASTAX文件

Small usage and content examples 小用法和内容示例
SIMPLE KEY: 简单的关键：

 insert into stackoverflow_simple (key, data) VALUES ('han', 'solo'); select * from stackoverflow_simple where key='han';

table content 表格内容

 key | data ----+------ han | solo

COMPOSITE/COMPOUND KEY can retrieve "wide rows" (ie you can query by just the partition key, even if you have clustering keys defined) COMPOSITE / COMPOUND KEY可以检索“宽行”（即，您可以只通过分区键进行查询，即使您已定义了聚类键）

 insert into stackoverflow_composite (key_part_one, key_part_two, data) VALUES ('ronaldo', 9, 'football player'); insert into stackoverflow_composite (key_part_one, key_part_two, data) VALUES ('ronaldo', 10, 'ex-football player'); select * from stackoverflow_composite where key_part_one = 'ronaldo';

table content 表格内容

  key_part_one | key_part_two | data --------------+--------------+-------------------- ronaldo | 9 | football player ronaldo | 10 | ex-football player

But you can query with all key (both partition and clustering) ... 但您可以使用所有密钥（分区和群集）查询...

 select * from stackoverflow_composite where key_part_one = 'ronaldo' and key_part_two = 10;

query output 查询输出

  key_part_one | key_part_two | data --------------+--------------+-------------------- ronaldo | 10 | ex-football player

Important note: the partition key is the minimum-specifier needed to perform a query using a where clause . 重要说明：分区键是使用where clause执行查询所需的最小说明符。 If you have a composite partition key, like the following 如果您有复合分区键，如下所示

eg: PRIMARY KEY((col1, col2), col10, col4)) 例如： PRIMARY KEY((col1, col2), col10, col4))

You can perform query only by passing at least both col1 and col2, these are the 2 columns that define the partition key. 您只能通过至少传递col1和col2来执行查询，这些是定义分区键的2列。 The "general" rule to make query is you have to pass at least all partition key columns, then you can add optionally each clustering key in the order they're set. 要进行查询的“常规”规则是您必须至少传递所有分区键列，然后您可以按照它们设置的顺序可选地添加每个群集键。

so the valid queries are ( excluding secondary indexes ) 所以有效的查询是（ 不包括二级索引 ）

col1 and col2 col1和col2
col1 and col2 and col10 col1和col2和col10
col1 and col2 and col10 and col 4 col1和col2和col10和col 4

Invalid: 无效：

col1 and col2 and col4 col1和col2和col4
anything that does not contain both col1 and col2 任何不包含col1和col2的东西

Hope this helps. 希望这可以帮助。

#3楼

In cassandra , the difference between primary key,partition key,composite key, clustering key always makes some confusion.. So I am going to explain below and co relate to each others. 在cassandra中，主键，分区键，复合键，聚类键之间的区别总是会产生一些混乱。所以我将在下面解释并与其他人联系起来。 We use CQL (Cassandra Query Language) for Cassandra database access. 我们使用CQL（Cassandra查询语言）进行Cassandra数据库访问。 Note:- Answer is as per updated version of Cassandra. 注意： - 答案是根据Cassandra的更新版本。 Primary Key :- 首要的关键：-

In cassandra there are 2 different way to use primary Key . 在cassandra中有两种不同的方式来使用主键。

CREATE TABLE Cass (
    id int PRIMARY KEY,
    name text 
);

Create Table Cass (
   id int,
   name text,
   PRIMARY KEY(id) 
);

In CQL, the order in which columns are defined for the PRIMARY KEY matters. 在CQL中，为PRIMARY KEY定义列的顺序很重要。 The first column of the key is called the partition key having property that all the rows sharing the same partition key (even across table in fact) are stored on the same physical node. 密钥的第一列称为分区密钥，其具有共享相同分区密钥（实际上甚至跨表）的所有行存储在同一物理节点上的属性。 Also, insertion/update/deletion on rows sharing the same partition key for a given table are performed atomically and in isolation. 此外，对于给定表共享相同分区键的行上的插入/更新/删除是以原子方式单独执行的。 Note that it is possible to have a composite partition key, ie a partition key formed of multiple columns, using an extra set of parentheses to define which columns forms the partition key. 请注意，可以使用复合分区键，即由多列组成的分区键，使用一组额外的括号来定义哪些列构成分区键。

Partitioning and Clustering The PRIMARY KEY definition is made up of two parts: the Partition Key and the Clustering Columns. 分区和集群 PRIMARY KEY定义由两部分组成：分区键和聚类列。 The first part maps to the storage engine row key, while the second is used to group columns in a row. 第一部分映射到存储引擎行键，而第二部分用于对一行中的列进行分组。

CREATE TABLE device_check (
  device_id   int,
  checked_at  timestamp,
  is_power    boolean,
  is_locked   boolean,
  PRIMARY KEY (device_id, checked_at)
);

Here device_id is partition key and checked_at is cluster_key. 这里device_id是分区键，checked_at是cluster_key。

We can have multiple cluster key as well as partition key too which depends on declaration. 我们可以有多个集群密钥以及依赖于声明的分区密钥。

#4楼

Adding a summary answer as the accepted one is quite long. 添加摘要答案作为已接受的答案很长。 The terms "row" and "column" are used in the context of CQL, not how Cassandra is actually implemented. 术语“行”和“列”在CQL的上下文中使用，而不是如何实际实现Cassandra。

A primary key uniquely identifies a row. 主键唯一标识一行。
A composite key is a key formed from multiple columns. 复合键是由多列组成的键 。
A partition key is the primary lookup to find a set of rows, ie a partition. 分区键是查找一组行的主查找，即分区。
A clustering key is the part of the primary key that isn't the partition key (and defines the ordering within a partition). 集群密钥是主密钥的一部分，它不是分区密钥（并定义分区内的顺序）。

Examples: 例子：

PRIMARY KEY (a) : The partition key is a . PRIMARY KEY (a) ：分区键是a 。
PRIMARY KEY (a, b) : The partition key is a , the clustering key is b . PRIMARY KEY (a, b) ：分区键是a ，聚类键是b 。
PRIMARY KEY ((a, b)) : The composite partition key is (a, b) . PRIMARY KEY ((a, b)) ：复合分区键是(a, b) 。
PRIMARY KEY (a, b, c) : The partition key is a , the composite clustering key is (b, c) . PRIMARY KEY (a, b, c) ：分区键是a ，复合簇密钥是(b, c) 。
PRIMARY KEY ((a, b), c) : The composite partition key is (a, b) , the clustering key is c . PRIMARY KEY ((a, b), c) ：复合分区键是(a, b) ，聚类键是c 。
PRIMARY KEY ((a, b), c, d) : The composite partition key is (a, b) , the composite clustering key is (c, d) . PRIMARY KEY ((a, b), c, d) ：复合分区键是(a, b) ，复合聚类键是(c, d) 。

#5楼

In database design, a compound key is a set of superkeys that is not minimal. 在数据库设计中，复合键是一组非最小的超级键。

A composite key is a set that contains a compound key and at least one attribute that is not a superkey 复合键是一个包含复合键和至少一个不是超级键的属性的集合

Given table: EMPLOYEES {employee_id, firstname, surname} 给定表：EMPLOYEES {employee_id，firstname，surname}

Possible superkeys are: 可能的超级键是：

{employee_id}
{employee_id, firstname}
{employee_id, firstname, surname}

{employee_id} is the only minimal superkey, which also makes it the only candidate key--given that {firstname} and {surname} do not guarantee uniqueness. {employee_id}是唯一的最小超级密钥，它也是唯一的候选密钥 - 假设{firstname}和{surname}不保证唯一性。 Since a primary key is defined as a chosen candidate key, and only one candidate key exists in this example, {employee_id} is the minimal superkey, the only candidate key, and the only possible primary key. 由于主键被定义为所选择的候选键，并且在该示例中仅存在一个候选键，因此{employee_id}是最小超级键，唯一候选键和唯一可能的主键。

The exhaustive list of compound keys is: 复合键的详尽列表是：

{employee_id, firstname}
{employee_id, surname}
{employee_id, firstname, surname}

The only composite key is {employee_id, firstname, surname} since that key contains a compound key ({employee_id,firstname}) and an attribute that is not a superkey ({surname}). 唯一的组合键是{employee_id，firstname，surname}，因为该键包含复合键（{employee_id，firstname}）和不是超级键（{surname}）的属性。

#6楼

Primary Key : Is composed of partition key(s) [and optional clustering keys(or columns)] 主键：由分区键[和可选的聚类键（或列）组成]
Partition Key : The hash value of Partition key is used to determine the specific node in a cluster to store the data 分区键 ： 分区键的哈希值用于确定群集中的特定节点以存储数据
Clustering Key : Is used to sort the data in each of the partitions(or responsible node and it's replicas) 群集密钥 ：用于对每个分区（或负责节点及其副本）中的数据进行排序

Compound Primary Key : As said above, the clustering keys are optional in a Primary Key. 复合主键 ：如上所述，聚类键在主键中是可选的。 If they aren't mentioned, it's a simple primary key. 如果没有提到它们，它就是一个简单的主键。 If clustering keys are mentioned, it's a Compound primary key. 如果提到了聚类键，则它是复合主键。

Composite Partition Key : Using just one column as a partition key, might result in wide row issues (depends on use case/data modeling). 复合分区键 ：仅使用一列作为分区键，可能会导致广泛的行问题 （取决于用例/数据建模）。 Hence the partition key is sometimes specified as a combination of more than one column. 因此，分区键有时被指定为多个列的组合。

Regarding confusion of which one is mandatory , which one can be skipped etc. in a query, trying to imagine Cassandra as a giant HashMap helps. 关于哪一个是强制性的混淆 ，哪一个可以在查询中跳过等等，试图将Cassandra想象成一个巨大的HashMap有帮助。 So in a HashMap, you can't retrieve the values without the Key. 因此，在HashMap中，如果没有Key，则无法检索值。
Here, the Partition keys play the role of that key. 这里， 分区键起到该键的作用。 So each query needs to have them specified. 因此每个查询都需要指定它们。 Without which Cassandra won't know which node to search for. 没有它，Cassandra将不知道要搜索哪个节点。
The clustering keys (columns, which are optional) help in further narrowing your query search after Cassandra finds out the specific node(and it's replicas) responsible for that specific Partition key . 在Cassandra找到负责该特定分区键的特定节点（及其副本）之后， 群集键 （列是可选的）有助于进一步缩小查询搜索范围。