CHAPTER 5: DESIGN CONSISTENT HASHING

The rehashing problem

serverIndex = hash(key) % N
在这里插入图片描述在这里插入图片描述
However, problems arise when new servers are added, or existing servers are
removed.
在这里插入图片描述
在这里插入图片描述
This means that when server 1 goes offline, most cache clients will
connect to the wrong servers to fetch data. This causes a storm of cache misses.

Consistent hashing

Quoted from Wikipedia: "Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped [1]”.

Hash space and hash ring

在这里插入图片描述

Hash servers

在这里插入图片描述

Hash keys

在这里插入图片描述
在这里插入图片描述

Add a server

在这里插入图片描述
Remove a server
在这里插入图片描述

Two issues in the basic approach

• Map servers and keys on to the ring using a uniformly distributed hash function.
• To find out which server a key is mapped to, go clockwise from the key position until the first server on the ring is found.

First, it is impossible to keep the same size
of partitions on the ring for all servers considering a server can be added or removed
在这里插入图片描述
Second, it is possible to have a non-uniform key distribution on the ring. For instance, if servers are mapped to positions listed in Figure 5-11, most of the keys are stored on server 2. However, server 1 and server 3 have no data.
在这里插入图片描述

Virtual nodes

在这里插入图片描述
As the number of virtual nodes increases, the distribution of keys becomes more balanced.

However, more spaces are needed to store data about virtual nodes.
This is a tradeoff, and we can tune the number of virtual nodes to fit our system requirements.

Find affected keys

在这里插入图片描述
located between s3 and s4 need to be redistributed to s4.

在这里插入图片描述

keys located between s0 and s1 must be redistributed to s2.

The benefits of consistent hashing include:
• Minimized keys are redistributed when servers are added or removed.
• It is easy to scale horizontally because data are more evenly distributed.
• Mitigate hotspot key problem. Excessive access to a specific shard could cause server
overload. Imagine data for Katy Perry, Justin Bieber, and Lady Gaga all end up on the
same shard. Consistent hashing helps to mitigate the problem by distributing the data more
evenly.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值