memcache 一致性hash算法研究

最新推荐文章于 2018-11-08 18:55:55 发布

原创

最新推荐文章于 2018-11-08 18:55:55 发布 · 518 阅读

0 ·

CC 4.0 BY-SA版权

问题背景

1、在解决memcache的分布式存储的时候，客户端需要选择存储节点；
2、常规取模运算的hash算法：集群中机器上下线之后，命中率急剧下降，缓存需要重新建立，瞬间会给DB带来极高的系统负载；
3、一致性hash算法：集群中机器上下线之后，能保持较高的命中率。

设计核心点

1、cache迁移少(提高命中率)
2、key分布均匀(负载均衡)

算法解读

基本思想

不仅仅key进行hash,机器节点本身也进行hash

1、取0~2^32-1的环
2、算机器的hash，划分区间
3、存取时,算key的hash，选择区间的机器

图解参见链接： https://blog.youkuaiyun.com/lihao21/article/details/54193868

虚拟节点

一致性hash解决命中率问题，虚拟节点解决key的分布均匀问题(负载均衡)

1、使用一般的hash运算，服务器的映射地点的分布非常不均匀；
2、使用虚拟节点的思想，为每个物理节点（服务器）在圆环上分配100～200个点。
3、这样就能抑制分布不均匀，最大限度地减小服务器增减时的缓存重新分布。

源码解读(采用一致性hash时，即定义枚举型变量Locator为CONSISTENT时)

服务器对于hash区间的割分

KetamaNodeLocator类中

//机器划分hash区间，记录在本类的ketamaNodes成员属性中
protected void setKetamaNodes(List<MemcachedNode> nodes) {
    TreeMap<Long, MemcachedNode> newNodeMap =
            new TreeMap<Long, MemcachedNode>();
    int numReps = config.getNodeRepetitions();
    int nodeCount = nodes.size();
    int totalWeight = 0;

    if (isWeightedKetama) {
        for (MemcachedNode node : nodes) {
            totalWeight += weights.get(node.getSocketAddress());
        }
    }

    for (MemcachedNode node : nodes) {
      if (isWeightedKetama) {
          //带权重的一致性hash算法节点划分@3@
          int thisWeight = weights.get(node.getSocketAddress());
          float percent = (float)thisWeight / (float)totalWeight;
          int pointerPerServer = (int)((Math.floor((float)(percent * (float)config.getNodeRepetitions() / 4 * (float)nodeCount + 0.0000000001))) * 4);
          for (int i = 0; i < pointerPerServer / 4; i++) {
              for(long position : ketamaNodePositionsAtIteration(node, i)) {
                  newNodeMap.put(position, node);
                  getLogger().debug("Adding node %s with weight %s in position %d", node, thisWeight, position);
              }
          }
      } else {
          // Ketama does some special work with md5 where it reuses chunks.
          // Check to be backwards compatible, the hash algorithm does not
          // matter for Ketama, just the placement should always be done using
          // MD5
          //一致性hash算法节点划分@2@
          if (hashAlg == DefaultHashAlgorithm.KETAMA_HASH) {
              for (int i = 0; i < numReps / 4; i++) {
                  for(long position : ketamaNodePositionsAtIteration(node, i)) {
                    newNodeMap.put(position, node);
                    getLogger().debug("Adding node %s in position %d", node, position);
                  }
              }
          } else {
            //非一致性hash算法节点划分@1@
              for (int i = 0; i < numReps; i++) {
                  newNodeMap.put(hashAlg.hash(config.getKeyForNode(node, i)), node);
              }
          }
      }
    }
    assert newNodeMap.size() == numReps * nodes.size();
    ketamaNodes = newNodeMap;
  }

@1@：非一致性hash算法节点划分
1、重复numReps次，每次都是计算一个虚拟节点的hash结果

2、计算每个虚拟节点进行hash运算的key值：config.getKeyForNode(node, i)
KetamaNodeKeyFormatter类中

public String getKeyForNode(MemcachedNode node, int repetition) {
        // Carrried over from the DefaultKetamaNodeLocatorConfiguration:
        // Internal Using the internal map retrieve the socket addresses
        // for given nodes.
        // I'm aware that this code is inherently thread-unsafe as
        // I'm using a HashMap implementation of the map, but the worst
        // case ( I believe) is we're slightly in-efficient when
        // a node has never been seen before concurrently on two different
        // threads, so it the socketaddress will be requested multiple times!
        // all other cases should be as fast as possible.
        String nodeKey = nodeKeys.get(node);
        if (nodeKey == null) {
            switch(this.format) {
                case LIBMEMCACHED:
                    InetSocketAddress address = (InetSocketAddress)node.getSocketAddress();
                    nodeKey = address.getHostName();
                    if (address.getPort() != 11211) {
                        nodeKey += ":" + address.getPort();
                    }
                    break;
                case SPYMEMCACHED:
                    nodeKey = String.valueOf(node.getSocketAddress());
                    if (nodeKey.startsWith("/")) {
                        nodeKey = nodeKey.substring(1);
                    }
                    break;
                default:
                    assert false;
            }
            nodeKeys.put(node, nodeKey);
        }
        return nodeKey + "-" + repetition;
    }

计算结果就是"SocketAddress-虚拟节点重复数"(10.10.1.118:11411-0)

3、根据key计算出虚拟节点的hash结果 DefaultHashAlgorithm类中

/**
* Compute the hash for the given key.
*
* @return a positive integer hash
*/
public long hash(final String k) {
    long rv = 0;
    int len = k.length();
    switch (this) {
    case NATIVE_HASH:
        rv = k.hashCode();
        break;
    case CRC_HASH:
        // return (crc32(shift) >> 16) & 0x7fff;
        CRC32 crc32 = new CRC32();
        crc32.update(KeyUtil.getKeyBytes(k));
        rv = (crc32.getValue() >> 16) & 0x7fff;
        break;
    case FNV1_64_HASH:
        // Thanks to pierre@demartines.com for the pointer
        rv = FNV_64_INIT;
        for (int i = 0; i < len; i++) {
        rv *= FNV_64_PRIME;
        rv ^= k.charAt(i);
        }
        break;
    case FNV1A_64_HASH:
        rv = FNV_64_INIT;
        for (int i = 0; i < len; i++) {
        rv ^= k.charAt(i);
        rv *= FNV_64_PRIME;
        }
        break;
    case FNV1_32_HASH:
        rv = FNV_32_INIT;
        for (int i = 0; i < len; i++) {
        rv *= FNV_32_PRIME;
        rv ^= k.charAt(i);
        }
        break;
    case FNV1A_32_HASH:
        rv = FNV_32_INIT;
        for (int i = 0; i < len; i++) {
        rv ^= k.charAt(i);
        rv *= FNV_32_PRIME;
        }
        break;
    case KETAMA_HASH:
        byte[] bKey = computeMd5(k);
        rv = ((long) (bKey[3] & 0xFF) << 24)
          | ((long) (bKey[2] & 0xFF) << 16)
          | ((long) (bKey[1] & 0xFF) << 8)
          | (bKey[0] & 0xFF);
        break;
    default:
        assert false;
    }
    return rv & 0xffffffffL; /* Truncate to 32-bits */
}

@2@一致性hash算法节点划分
1、对于每个节点要算出numReps个虚拟节点，每组算出四个，一共算numReps/4组；

2、算一组四个虚拟节点：ketamaNodePositionsAtIteration(node, i)
KetamaNodeLocator类中

private List<Long> ketamaNodePositionsAtIteration(MemcachedNode node, int iteration) {
    List<Long> positions = new ArrayList<Long>();
    byte[] digest = DefaultHashAlgorithm.computeMd5(config.getKeyForNode(node, iteration));
    for (int h = 0; h < 4; h++) {
      Long k = ((long) (digest[3 + h * 4] & 0xFF) << 24)
          | ((long) (digest[2 + h * 4] & 0xFF) << 16)
          | ((long) (digest[1 + h * 4] & 0xFF) << 8)
          | (digest[h * 4] & 0xFF);
      positions.add(k);
    }
    return positions;
}

config.getKeyForNode(node, iteration)还是取出"SocketAddress-虚拟节点重复数"(10.10.1.118:11411-0)作为key值

DefaultHashAlgorithm类中computeMd5方法用key算出md5值digest字节数组

digest字节数组是16位的，每四位一取，倒序拼接成32位的二进制数【(long) (digest[3 + h * 4] & 0xFF) << 24是将相邻四个字节中的最后一位以二进制左移32位】，作为虚拟节点最终在hash结果集合内的落点，一组就算出四个虚拟节点的位置

@3@带权重的一致性hash算法节点划分 1、从weights中获得每个节点的权重(如果是带权重的节点，之前初始化的时候会设置这里的值)，算出每个节点权重所占的百分比；

2、根据百分比算出每个节点应该有的虚拟节点数(因为floor的关系，这里算出来的虚拟节点数可能不等于重复率乘上节点数量，一般会比这个值小一点，原权重比例也会有所偏差，可以以权重2、3、5，重复率13，节点数3进行测算)

3、对于每个节点的虚拟节点，按照4个一组进行计算计算虚拟节点的hash结果ketamaNodePositionsAtIteration(node, i)，和不带权重的ketama计算方式是一样的

客户端对于key值的计算

KetamaNodeLocator类中

public MemcachedNode getPrimary(final String k) {
    MemcachedNode rv = getNodeForKey(hashAlg.hash(k));
    assert rv != null : "Found no node for key " + k;
    return rv;
}

1、根据存储对象的key计算出hash结果hashAlg.hash(k)，与计算服务器的虚拟节点hash结果用的是一个方法

2、根据算得的hash坐标在环中找到应该存储的服务器节点getNodeForKey(long hash)

MemcachedNode getNodeForKey(long hash) {
    final MemcachedNode rv;
    if (!ketamaNodes.containsKey(hash)) {
        // Java 1.6 adds a ceilingKey method, but I'm still stuck in 1.5
        // in a lot of places, so I'm doing this myself.
        SortedMap<Long, MemcachedNode> tailMap = getKetamaNodes().tailMap(hash);
        if (tailMap.isEmpty()) {
            hash = getKetamaNodes().firstKey();
        } else {
            hash = tailMap.firstKey();
        }
    }
    rv = getKetamaNodes().get(hash);
    return rv;
}

判断hash坐标是否正好坐落在服务器虚拟节点上，在则直接获得服务器节点ip

不在，获取hash坐标之后的弧，如果之后的弧上没有服务器节点ip了，说明应该绕完整个环的尾部，去找环的第一个节点，获得服务器节点ip

如果弧上还有服务器节点ip的话，则选取第一个ip作为应该存储的服务器节点ip。

测试结果对比

不同算法结果对比

步骤：
1、将不同的算法枚举值作为参数传入，创建不同的memcache客户端，创建过程根据不同的hash算法计算生成不同的服务器hash结果集合(存储在对应客户端的MemcachedConnection的NodeLocator中)

2、获得不同算法产生的NodeLocator

3、实验标本使用线上环境某个长时间段内的大量数据进行测试，计算对应key值hash之后最终选择的服务器，统计对于大量数据每个算法的耗时，以及每个服务器命中数的均匀性。

4、进行三次实验对比

测试结果：

1st:
FNV1A_64_HASH      costs:1184ms,      number:2387858      perdata:0.49584188004479324ns.
10.10.1.118:11411 : 555973 , 23.28%
10.16.69.133:11411 : 533208 , 22.33%
10.16.69.135:11411 : 355425 , 14.88%
10.16.6.184:11411 : 235848 , 9.88%
10.10.1.228:11411 : 346293 , 14.50%
10.10.1.208:11411 : 361111 , 15.12%
variance:0.33143952290634493

FNV1_32_HASH      costs:1174ms,      number:2387858      perdata:0.49165402632819877ns.
10.10.1.118:11411 : 489055 , 20.48%
10.16.69.133:11411 : 395165 , 16.55%
10.16.69.135:11411 : 259880 , 10.88%
10.16.6.184:11411 : 232323 , 9.73%
10.10.1.228:11411 : 277434 , 11.62%
10.10.1.208:11411 : 734001 , 30.74%
variance:0.6438545073606985

KETAMA_HASH      costs:2352ms,      number:2387858      perdata:0.9849831941430353ns.
10.10.1.118:11411 : 411977 , 17.25%
10.16.69.133:11411 : 449525 , 18.83%
10.16.6.184:11411 : 387824 , 16.24%
10.16.69.135:11411 : 346523 , 14.51%
10.10.1.228:11411 : 382838 , 16.03%
10.10.1.208:11411 : 409171 , 17.14%
variance:0.12852730715084365

NATIVE_HASH      costs:1024ms,      number:2387858      perdata:0.42883622057928067ns.
10.10.1.118:11411 : 293716 , 12.30%
10.16.69.133:11411 : 209344 , 8.77%
10.16.69.135:11411 : 694618 , 29.09%
10.16.6.184:11411 : 362735 , 15.19%
10.10.1.228:11411 : 191147 , 8.00%
10.10.1.208:11411 : 636298 , 26.65%
variance:0.7987988531108471

CRC_HASH      costs:1733ms,      number:2387858      perdata:0.7257550490858334ns.
10.10.1.118:11411 : 347616 , 14.56%
10.16.69.133:11411 : 327404 , 13.71%
10.16.6.184:11411 : 459595 , 19.25%
10.16.69.135:11411 : 481764 , 20.18%
10.10.1.228:11411 : 395621 , 16.57%
10.10.1.208:11411 : 375858 , 15.74%
variance:0.1661475366273886

FNV1_64_HASH      costs:971ms,      number:2387858      perdata:0.4066405958813296ns.
10.10.1.118:11411 : 248007 , 10.39%
10.16.69.133:11411 : 324288 , 13.58%
10.16.6.184:11411 : 1051464 , 44.03%
10.16.69.135:11411 : 239652 , 10.04%
10.10.1.228:11411 : 126840 , 5.31%
10.10.1