Cockroach Design 翻译 ( 十四) 拆分/合并range

最新推荐文章于 2025-04-16 03:58:20 发布

翻译最新推荐文章于 2025-04-16 03:58:20 发布 · 2.7k 阅读

文章标签：

#Cockroach #分布式OLTP #数据库

分布式数据库专栏收录该内容

18 篇文章

订阅专栏

1 Splitting / Merging Ranges拆分/合并range

Nodes splitor merge ranges based on whether they exceed maximum or minimum thresholds forcapacity or load. Ranges exceeding maximums for either capacity or load aresplit; ranges below minimums for both capacity and load aremerged.

节点拆分或者合并range，是基于它们是否超过最大或者最小容量/负载的阀值的。超过容量或者负载任一最大值的range会被拆分；容量和负载都低于最小值的range会被合并。

Rangesmaintain the same accounting statistics as accounting key prefixes. These boildown to a time series of data points with minute granularity. Everything fromnumber of bytes to read/write queue sizes. Arbitrary distillationsof the accounting stats can be determined as the basis forsplitting / merging. Two sensible metrics foruse with split/merge are range size in bytes and IOps. A good metric forrebalancing a replica from one node to another would be total read/write queuewait times. These metrics are gossipped, with each range / nodepassing along relevant metrics if they’re in the bottom or top of the rangeit’s aware of.

Ranges维持与key前缀相同的记帐统计信息。这些归结为数据的以分钟为分隔的一个时间序列。从字节数到读/写队列大小的每件事情。记帐统计提取后的精华是确定拆分/合并的基础。确定是否拆分/合并使用的两个合情理的指标是range字节数大小和IOps（每秒IO数）。确定是否将一个副本从一个节点重平衡到另一个节点的好指标是读/写队列总的等待时间。这些指标被内部使用，每个range/节点沿用相对指标，来度量是否处于range底部或者顶部。

A rangefinding itself exceeding either capacity or load threshold splits. To this end,the range lease holder computes an appropriate split key candidate and issuesthe split through Raft. In contrast to splitting, merging requires a range tobe below the minimum threshold for both capacity and load. Arange being merged chooses the smaller of the ranges immediatelypreceding and succeeding it.

一个range发现它自己超过容量或者负载阀值时会进行拆分。为此，range租约持有者计算一个适当的拆分key候选并且通过Raft分发该拆分操作。与拆分相比，合并则需要一个range的容量和负载均低于最低阀值。正在合并的range先选择较小者，接着完成处理。

Splitting,merging, rebalancing and recovering all follow the same basic algorithm formoving data between roach nodes. New target replicas are created and added tothe replica set of source range. Then each new replica is brought up to date byeither replaying the log in full or copying a snapshot ofthe source replica data and then replaying the log from the timestamp of thesnapshot to catch up fully. Once the new replicas are fully up to date, therange metadata is updated and old, source replica(s) deleted if applicable.

拆分、合并、重平衡和恢复都遵循相同的基本算法来在节点间移动数据。新的目标副本被创建并添加到源range的集合中。然后每个新副本通过以下方式被更新：重放足够的日志，或者copy源副本数据的一个快照并从快照的时间戳开始重放日志直到完全同步。一旦新副本被完全同步，range元数据也被更新，老的源副本（如果有）被删除。

Coordinator (leaseholder replica) 协调者（持有租约的副本）

if splitting

SplitRange(split_key):splits happen locally on range replicas and

only after being completedlocally, are moved to new target replicas.

else if merging

Choose new replicas onsame servers as target range replicas;

add to replica set.

else if rebalancing || recovering

Choose new replica(s) onleast loaded servers; add to replica set.

New Replica 新副本

Bring replicaup to date:

更新副本：

if all info can be read from replicated log

copy replicated log

else

snapshot source replica

send successive ReadRangerequests to source replica

referencing snapshot

if merging

combine ranges on allreplicas

else if rebalancing || recovering

remove old rangereplica(s)

Nodes splitranges when the total data in a range exceeds a configurable maximum threshold.Similarly, ranges are merged when the total data falls below a configurableminimum threshold.

当range内的总数据量超过配置的最大阀值时，节点就会拆这些range。类似地，当总数据量低于配置的最小阀值时，节点就会合并这些range。

TBD: fleshthis out: Especially for merges (but alsorebalancing) we have a range disappearing from the local node; that range needsto disappear gracefully, with a smooth handoff of operation to the new owner ofits data.

待定：更具体化：特别是对于合并（重平衡也是如此），我们会有一个range从本地节点消失；这个range需要优雅地消失，平滑地切换到其数据的新owner。

Ranges arerebalanced if a node determines its load or capacity is one of the worstin the cluster based on gossipped load stats. A node with sparecapacity is chosen in the same datacenter and a special-case split is donewhich simply duplicates the data 1:1 and resets the range configurationmetadata.

如果一个节点基于gossip的负载统计，确定它的负载和容量是集群中最差之一，那么range就会被重平衡。在相同的数据中心内，具有很少容量的节点被选择，并进行特殊场景拆分，简单复制数据1：1并重置range配置元数据。