统计一篇短文中单词出现频率

于 2016-03-05 18:05:02 发布

阅读量1.2k

点赞数

CC 4.0 BY-SA版权

分类专栏： C++基础算法导论数据结构

本文链接：https://blog.youkuaiyun.com/Tander_Tang/article/details/50809990

本文介绍如何利用散列查找技术统计英文文本文件中每个单词出现的频率。通过C++实现，采用双重探测的哈希函数，并选择与输入整数接近的素数作为表大小。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

散列查找的应用：给定一个英文文本文件，统计文件中所有单词出现的频率。

解决这问题最基本的工作是不断地对读入的单词在已有的单词中查找，如果存在就将该单词频数加1，如果不存在就将该单词插入并记录频数为1.下面C++代码的哈希函数使用了双重探测的办法。在确定表的时候是取比输入整数小且距离输入整数最近的素数。

#include<iostream>
#include<algorithm>
#include<fstream>
#include<iomanip>
#include<string>
using namespace std;
class HashEntry{
public:
	string words_;                //记录单词
	int totalTimes_;              //记录words_出现的次数
	bool operator<(HashEntry const&a){
		return totalTimes_ > a.totalTimes_;
	}
};
class HashTable{
private:
	HashEntry*hash;
	int nextPrime_;             //哈希表大小，用素数
	int numberOfWords_;         //记录不同单词数的总数
public:
	HashTable(int size);        
	int getNumberofwords(){ return numberOfWords_; }
	int hashFunction(string key);   //哈希函数
	voi