在GCC和Visual Studio中使用hash_map

GCC与VS中hash_map的使用

最新推荐文章于 2023-03-22 22:55:31 发布

最新推荐文章于 2023-03-22 22:55:31 发布 · 333 阅读

文章标签：

#数据结构与算法 #操作系统 #c/c++

本文介绍了GCC和Visual Studio中hash_map的使用方法及其与map的区别。详细讲解了hash_map的定义与模板参数设置，并通过实测比较了hash_map与map在不同编译器下的性能表现。

在GCC和Visual Studio中使用hash_map

熟悉STL或熟悉ACM/ICPC的话，其中的set, map, multiset, multimap一定用过无数次了，它们都是用平衡二叉树（红黑树）实现的，复杂度为O(lgn)。我们也知道set, map可以通过哈希来实现，复杂度只有O(1)，可惜直到现在，unsorted_set或hash_map都没能成为C++标准的一部分（C++0x，- -b）。不过无论在GNU GCC中还是Microsoft Visual Studio中都有对hash_set, hash_map, hash_multiset, hash_multimap的支持。

GCC中的hash_map定义在<ext/hash_map>文件，namespace__gnu_cxx中。要定义一个hash_map<int, int>非常简单：

 
    #include <ext/hash_map> 

    usingnamespace__gnu_cxx; 

    hash_map<int,int> hm;

在使用map时，如果我们想要改变元素顺序，或以自定义的struct/class作为key的时候，可以设定map第三个模板参数（默认是less<Key>，即operator<）。对于hash_map，我们需要设定其第三个(hash<Key>)和第四个模板参数(equal_to<Key>, operator==)。

 
    typedeflonglongmy_type; 

    typedefintany_type; 

    structmy_hash { 

    size_toperator()(constmy_type& key)const{ 

    return(key >> 32) ^ key; 

    } 

    }; 

    structmy_equal_to { 

    booloperator()(constmy_type& lhs,constmy_type& rhs)const{ 

    returnlhs == rhs; 

    } 

    }; 

    hash_map<my_type, any_type, my_hash, my_equal_to> my_hash_map;

对与int等基本类型，系统提供有hash<int>等版本的模板特化，所以只需要指定前两个模板参数就足够了。实现了模板特化的有以下类型

 
    [const]char*, crope, wrope, [signed|unsigned]char, [unsigned]short, [unsigned]int, [unsigned]long

如果需要的话，我们也可以为其他类型实现模板特化

 
// hash_map<Key, Tp, HashFn=hash<Key>, EqualKey=equal_to<Key>, Alloc=allocator<Tp> > 

#include <cstdio> 

#include <utility> 

#include <hash_map> 

usingnamespacestd; 

usingnamespace__gnu_cxx; 



namespace__gnu_cxx { 

template<> 

structhash<pair<int,int> > { 

size_toperator()(constpair<int,int>& key)const{ 

returnkey.first * key.second; 

} 

}; 

} 

hash_map<pair<int,int>,int> hm;

Visual C++的hash_map定义在<hash_map>文件，namespacestdext中，早先在namespace std中。其实现与GCC的不同，模板参数也不一样，比如上面的例子在VC++版本如下

 
// hash_map<Key, Type, Traits=hash_compare<Key, less<Key> >, Allocator=allocator<pair<const Key, Type> > > 

> 

classhash_map 

#include <cstdio> 

#include <utility> 

#include <hash_map> 

usingnamespacestd; 

usingnamespacestdext; 



template<> 

structhash_compare<pair<int,int> > { 

// the mean number of elements per bucket 

staticconstsize_tbucket_size = 4; 

// the minimum number of buckets 

staticconstsize_tmin_buckets = 8; 

// hash function 

size_toperator()(constpair<int,int>& key)const{ 

returnkey.first * key.second; 

} 

// compare function 

booloperator()(constpair<int,int>& lhs,constpair<int,int>& rhs)const{ 

returnlhs < rhs; 

} 

}; 

hash_map<pair<int,int>,int> hm;

相比前面的hash，上面的hash_compare显然要复杂不少。

不过二者提供的方法基本一致，也和std::map和其他STL容器相似。所以对于上面定义的hash_map，我们都可以用下面的代码进行测试

 
... 

intmain() { 

intn; 

scanf("%d", &n); 

for(inti = 0; i < n; ++i) { 

hm[make_pair(i, i)] = i; 

} 

for(hash_map<pair<int,int>,int>::iterator i = hm.begin(); i != hm.end(); ++i) { 

printf("%d ", i->second); 

} 

printf("\n%d / %d\n", hm.size(), hm.bucket_count()); 

return0; 

}

n取12时，GCC 4.4.1得到的结果是

 
1 2 3 4 5 6 7 8 9 10 11 

/ 193

而Visual Studio 2010得到的结果是

 
4 8 1 3 5 7 9 11 2 6 10 

/ 8

由此我们可以看出二者在hash_map实现上的不同。__gnu_cxx::hash_map保持size<=bucket_count，而以193, 389, 769, 1543…这样近似成倍增长的质数作为bucket_count。stdext::hash_map则保持size<=bucket_size*bucket_count，bucket_count起初为min_buckets=8，不足时以8倍增长。

之所以我们要选择hash_map，是为了获得更高的效率。hash_map比map到底快多少呢，我们通过下面的程序来测试一下。通过__GNUC__和_MSC_VER这两个宏，我们可以把两个版本的代码写到一起

 
#include <map> 

#include <ctime> 

#include <cstdio> 

usingnamespacestd; 



#if defined(_MSC_VER) 

# include <hash_map> 

usingstdext::hash_map; 

#elif defined(__GNUC__) 

# include <ext/hash_map> 

using__gnu_cxx::hash_map; 

#else 

# error 悲剧啊 

#endif 



intmain() { 

clock_tS; 



S =clock(); 

for(inti = 0; i < 20; ++i) { 

hash_map<int,int> H; 

for(inti = 0; i < (1 << 20); ++i) { 

H[i] = i; 

} 

} 

printf("hash_map: %.2lfs\n", (double)(clock() - S) / CLOCKS_PER_SEC); 



S =clock(); 

for(inti = 0; i < 20; ++i) { 

map<int,int> M; 

for(inti = 0; i < (1 << 20); ++i) { 

M[i] = i; 

} 

} 

printf("map: %.2lfs\n", (double)(clock() - S) / CLOCKS_PER_SEC); 



return0; 

}

结果得到的是（gcc和VS的结果分别来自不同机器，因此没有可比性）

 
    * g++ -O2 

    hash_map: 5.39s 

    map: 12.35s 

    * VS2010 Release 

    hash_map: 9.53s 

    map: 9.44s

发现g++里hash_map确实要比map快不少，而Visual Studio 2010就是个悲剧，信hash_map不如信春哥啊。

嘛，hash_map确实可能带来一些performance，但不那么stable，所以我们可以考虑优先使用hash_map，而将map最为fallback备胎。

 
#define USE_HASH_MAP ? 

#if USE_HASH_MAP 

#include <ext/hash_map> 

typedef__gnu_cxx::hash_map<int,int> Hash; 

#else 

#include <map> 

typedefstd::map<int,int> Hash; 

#endif