HDKV: High-Dimensional Similarity Query in Key-Value Stores

本文深入解析了Locality-sensitive hashing(LSH)在高维数据维度减少中的应用,特别是如何利用Stable distributions进行哈希函数构建以优化相似项的映射。此外,文章还讨论了实际应用中如何通过改进的哈希函数如k-means hash来提高效率,并提供了一些实用的构造方法以更好地适应数据集。

文章集中于key-value store

 

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

 

Stable distributions

The hash function [8] h_{\mathbf{a},b} (\boldsymbol{\upsilon}) :  \mathcal{R}^d \to \mathcal{N} maps a d dimensional vector \boldsymbol{\upsilon} onto a set of integers. Each hash function in the family is indexed by a choice of random \mathbf{a} and b where \mathbf{a} is a d dimensional vector with entries chosen independently from a stable distribution and b is a real number chosen uniformly from the range [0,r]. For a fixed \mathbf{a},b the hash function h_{\mathbf{a},b} is given by h_{\mathbf{a},b} (\boldsymbol{\upsilon}) = \left \lfloor \frac{\mathbf{a}\cdot \boldsymbol{\upsilon}+b}{r} \right \rfloor.

Other construction methods for hash functions have been proposed to better fit the data. [9] In particular k-means hash functions are better in practice than projection-based hash functions, but without any theoretical guarantee.

 

The key idea of locality-sensitive hash (LSH) is to hash the points using several hash functions so as to ensure that, for each function, the probability of
collision is much higher for objects which are close to each other than for those which are far apart. Then, one can determine near neighbors by hashing the
query point and retrieving elements stored in buckets containing that point.

转载于:https://www.cnblogs.com/zhangzhang/archive/2012/02/17/2355143.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值