- 常用的字符串Hash函数还有ELFHash,APHash等等,都是十分简单有效的方法。这些函数使用
- 位运算使得每一个字符都对最后的函数值产生影响。另外还有以MD5和SHA1为代表的杂凑函数,
- 这些函数几乎不可能找到碰撞。
- 常用字符串哈希函数有BKDRHash,APHash,DJBHash,JSHash,RSHash,SDBMHash,
- PJWHash,ELFHash等等。对于以上几种哈希函数,我对其进行了一个小小的评测。
- Hash函数数据1数据2数据3数据4数据1得分数据2得分数据3得分数据4得分平均分
- BKDRHash20477448196.5510090.9582.0592.64
- APHash23475449396.5588.4610051.2886.28
- DJBHash22497547496.5592.31010083.43
- JSHash14476150610084.6296.8317.9581.94
- RSHash10486150510010051.5820.5175.96
- SDBMHash32484950493.192.3157.0123.0872.41
- PJWHash302648785130043.89021.95
- ELFHash302648785130043.89021.95
- 其中数据1为100000个字母和数字组成的随机串哈希冲突个数。数据2为100000个有意义的英文句
- 子哈希冲突个数。数据3为数据1的哈希值与1000003(大素数)求模后存储到线性表中冲突的个数。
- 数据4为数据1的哈希值与10000019(更大素数)求模后存储到线性表中冲突的个数。
- 经过比较,得出以上平均得分。平均数为平方平均数。可以发现,BKDRHash无论是在实际效果还是
- 编码实现中,效果都是最突出的。APHash也是较为优秀的算法。DJBHash,JSHash,RSHash与
- SDBMHash各有千秋。PJWHash与ELFHash效果最差,但得分相似,其算法本质是相似的。
- 在信息修竞赛中,要本着易于编码调试的原则,个人认为BKDRHash是最适合记忆和使用的。
- CmYkRgB123原创,欢迎建议、交流、批评和指正。
- 附:各种哈希函数的C语言程序代码
- unsignedintSDBMHash(char*str)
- {
- unsignedinthash=0;
- while(*str)
- {
- //equivalentto:hash=65599*hash+(*str++);
- hash=(*str++)+(hash<<6)+(hash<<16)-hash;
- }
- return(hash&0x7FFFFFFF);
- }
- //RSHash
- unsignedintRSHash(char*str)
- {
- unsignedintb=378551;
- unsignedinta=63689;
- unsignedinthash=0;
- while(*str)
- {
- hash=hash*a+(*str++);
- a*=b;
- }
- return(hash&0x7FFFFFFF);
- }
- //JSHash
- unsignedintJSHash(char*str)
- {
- unsignedinthash=1315423911;
- while(*str)
- {
- hash^=((hash<<5)+(*str++)+(hash>>2));
- }
- return(hash&0x7FFFFFFF);
- }
- //P.J.WeinbergerHash
- unsignedintPJWHash(char*str)
- {
- unsignedintBitsInUnignedInt=(unsignedint)(sizeof(unsignedint)*8);
- unsignedintThreeQuarters=(unsignedint)((BitsInUnignedInt*3)/4);
- unsignedintOneEighth=(unsignedint)(BitsInUnignedInt/8);
- unsignedintHighBits=(unsignedint)(0xFFFFFFFF)<<(BitsInUnignedInt
- -OneEighth);
- unsignedinthash=0;
- unsignedinttest=0;
- while(*str)
- {
- hash=(hash<<OneEighth)+(*str++);
- if((test=hash&HighBits)!=0)
- {
- hash=((hash^(test>>ThreeQuarters))&(~HighBits));
- }
- }
- return(hash&0x7FFFFFFF);
- }
- //ELFHash
- unsignedintELFHash(char*str)
- {
- unsignedinthash=0;
- unsignedintx=0;
- while(*str)
- {
- hash=(hash<<4)+(*str++);
- if((x=hash&0xF0000000L)!=0)
- {
- hash^=(x>>24);
- hash&=~x;
- }
- }
- return(hash&0x7FFFFFFF);
- }
- //BKDRHash
- unsignedintBKDRHash(char*str)
- {
- unsignedintseed=131;//31131131313131131313etc..
- unsignedinthash=0;
- while(*str)
- {
- hash=hash*seed+(*str++);
- }
- return(hash&0x7FFFFFFF);
- }
- //DJBHash
- unsignedintDJBHash(char*str)
- {
- unsignedinthash=5381;
- while(*str)
- {
- hash+=(hash<<5)+(*str++);
- }
- return(hash&0x7FFFFFFF);
- }
- //APHash
- unsignedintAPHash(char*str)
- {
- unsignedinthash=0;
- inti;
- for(i=0;*str;i++)
- {
- if((i&1)==0)
- {
- hash^=((hash<<7)^(*str++)^(hash>>3));
- }
- else
- {
- hash^=(~((hash<<11)^(*str++)^(hash>>5)));
- }
- }
- return(hash&0x7FFFFFFF);
- }
【以上内容转载自:http://explorers.javaeye.com/blog/698377】