今天在看ClickHouse
源码时,注意到ClickHouse
使用了cityhash128作为自己的HASH算法:
The first 16 bytes are the checksum from all other bytes of the block. Now only CityHash128 is used.
cityhash 算法是谷歌提出的哈希算法,之前从来没有听说过。
GitHub仓库6年没更新过了:google/cityhash: Automatically exported from code.google.com/p/cityhash
网上我也找不到多少相关的资料,如果想在C++中使用cityhash算法,唯一的方法就是编译源代码进行安装。
我找到了一个Python对cityhash 的实现:cityhash · PyPI
直接安装就行:
pip install cityhash
使用方法如下:
from cityhash import CityHash32, CityHash64, CityHash128
print(CityHash64('16'))
结果如下:
>>> print(CityHash64('16'))
179832329939032581
可以直接使用ClickHouse 自带的CityHash64算法:
ubuntu :) SELECT cityHash64('16') AS CityHash, toTypeName(CityHash) AS type;
SELECT
cityHash64('16') AS CityHash,
toTypeName(CityHash) AS type
┌───────────CityHash─┬─type───┐
│ 696724486834661759 │ UInt64 │
└────────────────────┴────────┘
1 rows in set. Elapsed: 0.001 sec.
结果不一样,这涉及到编码的问题了。
ClickHouse 在自己的jdbc接口中实现了CityHash算法,我不知道实现的是否正确:
clickhouse-jdbc/ClickHouseCityHash.java at master · yandex/clickhouse-jdbc
代码如下:
/*
* Copyright 2017 YANDEX LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2012 tamtam180
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ru.yandex.clickhouse.util;
/**
* @author tamtam180 - kirscheless at gmail.com
* @see http://google-opensource.blogspot.jp/2011/04/introducing-cityhash.html
* @see http://code.google.com/p/cityhash/
*
*/
/**
* NOTE: The code is modified to be compatible with CityHash128 used in ClickHouse
*/
public class ClickHouseCityHash {
private static final long k0 = 0xc3a5c85c97cb3127L;
private static final long k1 = 0xb492b66fbe98f273L;
private static final long k2 = 0x9ae16a3b2f90404fL;
private static final long k3 = 0xc949d7c7509e6557L;
private static long toLongLE(byte[] b, int i) {
return (((long)b[i+7] << 56) +
((long)(b[i+6] & 255) << 48) +
((long)(b[i+5] & 255) << 40) +
((long)(b[i+4] & 255) << 32) +
((long)(b[i+3] & 255) << 24) +
((b[i+2] & 255) << 16) +
((b[i+1] & 255) << 8) +
((b[i+0] & 255) << 0));
}
private static long toIntLE(byte[] b, int i) {
return (((b[i+3] & 255L) << 24) + (