cityhash 算法的使用

本文介绍了CityHash算法,一种由谷歌提出的哈希算法。由于官方GitHub仓库多年未更新,C++使用者需要编译源代码安装。同时提到了Python中的实现方式,并通过实例展示了在ClickHouse中使用CityHash64算法的不同结果,可能与编码问题有关。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

今天在看ClickHouse源码时,注意到ClickHouse使用了cityhash128作为自己的HASH算法:

The first 16 bytes are the checksum from all other bytes of the block. Now only CityHash128 is used.

cityhash 算法是谷歌提出的哈希算法,之前从来没有听说过。
GitHub仓库6年没更新过了:google/cityhash: Automatically exported from code.google.com/p/cityhash
网上我也找不到多少相关的资料,如果想在C++中使用cityhash算法,唯一的方法就是编译源代码进行安装。
我找到了一个Python对cityhash 的实现:cityhash · PyPI
直接安装就行:

pip install cityhash

使用方法如下:

from cityhash import CityHash32, CityHash64, CityHash128
print(CityHash64('16'))

结果如下:

>>> print(CityHash64('16'))
179832329939032581

可以直接使用ClickHouse 自带的CityHash64算法:

ubuntu :) SELECT cityHash64('16') AS CityHash, toTypeName(CityHash) AS type;

SELECT 
    cityHash64('16') AS CityHash, 
    toTypeName(CityHash) AS type

┌───────────CityHash─┬─type───┐
│ 696724486834661759 │ UInt64 │
└────────────────────┴────────┘

1 rows in set. Elapsed: 0.001 sec. 

结果不一样,这涉及到编码的问题了。

ClickHouse 在自己的jdbc接口中实现了CityHash算法,我不知道实现的是否正确:
clickhouse-jdbc/ClickHouseCityHash.java at master · yandex/clickhouse-jdbc
代码如下:

/*
 * Copyright 2017 YANDEX LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/*
 * Copyright (C) 2012 tamtam180
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package ru.yandex.clickhouse.util;

/**
 * @author tamtam180 - kirscheless at gmail.com
 * @see http://google-opensource.blogspot.jp/2011/04/introducing-cityhash.html
 * @see http://code.google.com/p/cityhash/
 *
 */

/**
 * NOTE: The code is modified to be compatible with CityHash128 used in ClickHouse
 */
public class ClickHouseCityHash {
   

    private static final long k0 = 0xc3a5c85c97cb3127L;
    private static final long k1 = 0xb492b66fbe98f273L;
    private static final long k2 = 0x9ae16a3b2f90404fL;
    private static final long k3 = 0xc949d7c7509e6557L;

    private static long toLongLE(byte[] b, int i) {
   
        return (((long)b[i+7] << 56) +
                ((long)(b[i+6] & 255) << 48) +
                ((long)(b[i+5] & 255) << 40) +
                ((long)(b[i+4] & 255) << 32) +
                ((long)(b[i+3] & 255) << 24) +
                ((b[i+2] & 255) << 16) +
                ((b[i+1] & 255) <<  8) +
                ((b[i+0] & 255) <<  0));
    }
    private static long toIntLE(byte[] b, int i) {
   
        return (((b[i+3] & 255L) << 24) + (
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值