哈夫曼树的构造，堆的基本使用，05-树9 Huffman Codes (30 分)

最新推荐文章于 2022-11-15 21:15:51 发布

碧海潮声按玉箫

最新推荐文章于 2022-11-15 21:15:51 发布

阅读量447

点赞数

分类专栏：数据结构与算法习题数据结构与算法基础知识文章标签：算法 c++

本文链接：https://blog.youkuaiyun.com/m0_49840707/article/details/121337264

版权

数据结构与算法习题同时被 2 个专栏收录

25 篇文章

订阅专栏

数据结构与算法基础知识

17 篇文章

订阅专栏

这篇博客探讨了哈夫曼树的构造，特别是如何使用堆来构建哈夫曼树。文章通过一个实例展示了如何检查学生提交的编码是否符合哈夫曼编码的要求，并解释了堆的插入、删除和构建等操作。同时，博客还提到了不同方法建立哈夫曼树的代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] … c[N] f[N]
where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]
where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0’s and '1’s.

Output Specification:
For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input:
7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
结尾无空行
Sample Output:
Yes
Yes
No
No
结尾无空行

这题主要考察了对哈夫曼树和堆的知识。
1.包括用堆构造哈夫曼树，堆的建立，插入与删除，其中堆的建立有两种方式，包括插入建立，也包括对已有的数据结构进行调整建立。
2.包括了哈夫曼树的建立，求哈夫曼树的权重，判断输入的二进制编码能否构成合理的哈夫曼树（这是一个难点）。
3.这里使用的是用堆的方式构造哈夫曼树。
4.也有其他方式构造哈夫曼树。如下代码。
代码量略微有点偏大。
1）首先，这里总结一下堆的使用，包括堆的建立，插入，散出以及堆的调整。
一.创建最小堆，以及最小堆的插入操作：

const int MaxSize = 1001;
const int MinData = -10001;
typedef int ElemType;
typedef struct HeapStruct
{
	int* elements;
	int size;
	int capacity;
}HeapStruct;
typedef struct HeapStruct* MinHeap;
//创建最小堆
MinHeap CreateHeap()
{
	MinHeap H = new HeapStruct;
	H->elements = new ElemType[MaxSize];//
	H->size = 0;
	H->capacity = MaxSize;
	H->elements[0] = MinData;
	return H;
}
//往最小堆中插入元素
void Insert(MinHeap& H, ElemType item)//主要这里面的引用
{
	int i;
	if (H->size >=H->capacity)
	{
		cout << "最小堆已满";
		return;
	}
	i = ++H->size;
	for (;H->elements[i / 2] > item;i /=2)
	{
		H->elements[i] = H->elements[i / 2];
	}
	H->elements[i] = item;
}

二.创建最大堆，以及最大堆的插入与删除操作。因为创建和插入与最小堆差不多，这里只写删除操作。

ElementType DeleteMax( MaxHeap H )
{ /* 从最大堆H中取出键值为最大的元素，并删除一个结点 */
    int Parent, Child;
    ElementType MaxItem, X;

    if ( IsEmpty(H) ) {
        printf("最大堆已为空");
        return ERROR;
    }

    MaxItem = H->Data[1]; /* 取出根结点存放的最大值 */
    /* 用最大堆中最后一个元素从根结点开始向上过滤下层结点 */
    X = H->Data[H->Size--]; /* 注意当前堆的规模要减小 */
    for( Parent=1; Parent*2<=H->Size; Parent=Child ) {
        Child = Parent * 2;
        if( (Child!=H->Size) && (H->Data[Child]<H->Data[Child+1]) )
            Child++;  /* Child指向左右子结点的较大者 */
        if( X >= H->Data[Child] ) break; /* 找到了合适位置 */
        else  /* 下滤X */
            H->Data[Parent] = H->Data[Child];
    }
    H->Data[Parent] = X;

    return MaxItem;
}

三，堆的构建包括两种方式
第一种方式是插入构建，插入的过程不断调整使其成为最大（最小）堆；
第二种方式是先将N个元素按照线性输入，使其满足完全二叉树的结构特性，然后再调整各个结点，使其满足排序特性。
第一种插入构建很简单，设置N循环，然后不断插入就可以了。
第二种方式是先将N个元素插入，然后再排序。N个元素插入很简单，因为完全二叉树每一层的数学关系，以及完全二叉树以数组存储，所以直接顺序放入即可。然后就是调整过程了，代码如下。

void PercDown( MaxHeap H, int p )
{ /* 下滤：将H中以H->Data[p]为根的子堆调整为最大堆 */
    int Parent, Child;
    ElementType X;

    X = H->Data[p]; /* 取出根结点存放的值 */
    for( Parent=p; Parent*2<=H->Size; Parent=Child ) {
        Child = Parent * 2;
        if( (Child!=H->Size) && (H->Data[Child]<H->Data[Child+1]) )
            Child++;  /* Child指向左右子结点的较大者 */
        if( X >= H->Data[Child] ) break; /* 找到了合适位置 */
        else  /* 下滤X */
            H->Data[Parent] = H->Data[Child];
    }
    H->Data[Parent] = X;
}

void BuildHeap( MaxHeap H )
{ /* 调整H->Data[]中的元素，使满足最大堆的有序性  */
  /* 这里假设所有H->Size个元素已经存在H->Data[]中 */

    int i;

    /* 从最后一个结点的父节点开始，到根结点1 */
    for( i = H->Size/2; i>0; i-- )
        PercDown( H, i );
}

以上就是堆的一些基本操作，下面是关于哈夫曼树的一些基本操作
包括初始化哈夫曼树，建立哈夫曼树，以及计算哈夫曼树的WPL值。
第一种就是用最小堆建立哈夫曼树的过程
代码如下：

struct TreeNode {
	int weight;
	struct TreeNode* left, * right;
};
typedef struct TreeNode* HuffmanTree;
//初始化哈夫曼树
HuffmanTree CreateTree()
{
	HuffmanTree H;
	H = new struct TreeNode;
	H->left = H->right = NULL;
	H->weight = 0;
	return H;
}
//建立哈夫曼树
HuffmanTree buildTree(MinHeap H)
{
	HuffmanTree T;
	int num = H->size;
	for (int i = 1;i < num;i++)
	{
		T = CreateTree();
		T->left = DeleteMin(H);
		T->right = DeleteMin(H);
		T->weight = T->left->weight + T->right->weight;
		Insert(H, T);
	}
	T = DeleteMin(H);
	return T;
}
//计算哈夫曼树的WPL值
int WPL(HuffmanTree Root, int depth)
{
	if ((Root->left == NULL) && (Root->right == NULL))
		return depth * Root->weight;
	else
		return WPL(Root->left, depth + 1) + WPL(Root->right, depth + 1);
}

第二种方法建立哈夫曼树（书本上面有所提及）：
代码如下：

#include<iostream>
using namespace std;
typedef struct {
	char data;
	double weight;
	int parent;
	int lchild;
	int rchild;
}HTNode;
void CreateHT(HTNode ht[], int n0)
{
	int i, k, lnode, rnode;
	double min1, min2;
	for (i = 0;i < 2 * n0 - 1;i++)
		ht[i].parent = ht[i].lchild = ht[i].rchild = -1;
	for (i = n0;i <= 2 * n0 - 2;i++)
	{
		min1 = min2 = 32767;
		lnode = rnode = -1;
		for(k=0;k<i-1;k++)
			if (ht[k].parent == -1)
			{
				if (ht[k].weight < min1)
				{
					min2 = min1;rnode = lnode;
					min1 = ht[k].weight;lnode = k;
				}
				else
					if (ht[k].weight < min2)
					{
						min2 = ht[k].weight;rnode = k;
					}
			}
		ht[i].weight = ht[lnode].weight + ht[rnode].weight;
		ht[i].lchild = lnode;ht[i].rchild = rnode;
		ht[lnode].parent = i;ht[rnode].parent = i;
	}
}

该题代码如下

#include<iostream>
#include<cstring>
using namespace std;
typedef int ElemType;
const int MinData = -1;
const int MaxSize = 100;
const int ERROR = -1;
int N;
char ch[MaxSize];
int w[MaxSize], TotalCodes;
struct TreeNode {
	int weight;
	struct TreeNode* left, * right;
};
typedef struct TreeNode* HuffmanTree;
typedef struct Hnode {
	HuffmanTree data[MaxSize];
	int size;
	int capacity;
}MinHnode;
typedef MinHnode* MinHeap;
//建立最小堆
MinHeap CreateHeap()
{
	MinHeap H = new MinHnode;
	H->data[0] = new struct TreeNode;//这里一定要N,才能开辟这么多空间
	H->size = 0;
	H->capacity = MaxSize;
	H->data[0]->weight = MinData;
	H->data[0]->left = H->data[0]->right = NULL;
	return H;
}
//从最小堆中取出最小元素，并删除一个结点
HuffmanTree DeleteMin(MinHeap H)
{
	HuffmanTree Mintem, temp;
	int parent, child;
	Mintem = H->data[1];
	temp = H->data[H->size--];
	for (parent = 1;parent * 2 <= H->size;parent = child)
	{
		child = parent * 2;
		if ((child != H->size) && (H->data[child]->weight > H->data[child + 1]->weight))
			child++;
		if (temp->weight <= H->data[child]->weight)
			break;
		else
		{
			H->data[parent] = H->data[child];
		}
	}
	H->data[parent] = temp;
	return Mintem;
}
//将哈夫曼结点插入最小堆中
void Insert(MinHeap H, HuffmanTree item)
{
	int i = ++H->size;
	while (item->weight < H->data[i / 2]->weight)
	{
		H->data[i] = H->data[i / 2];
		i /= 2;
	}
	H->data[i] = item;
}
//初始化哈夫曼树
HuffmanTree CreateTree()
{
	HuffmanTree H;
	H = new struct TreeNode;
	H->left = H->right = NULL;
	H->weight = 0;
	return H;
}
//建立哈夫曼树
HuffmanTree buildTree(MinHeap H)
{
	HuffmanTree T;
	int num = H->size;
	for (int i = 1;i < num;i++)
	{
		T = CreateTree();
		T->left = DeleteMin(H);
		T->right = DeleteMin(H);
		T->weight = T->left->weight + T->right->weight;
		Insert(H, T);
	}
	T = DeleteMin(H);
	return T;
}
//计算哈夫曼树的WPL值
int WPL(HuffmanTree Root, int depth)
{
	if ((Root->left == NULL) && (Root->right == NULL))
		return depth * Root->weight;
	else
		return WPL(Root->left, depth + 1) + WPL(Root->right, depth + 1);
}
//判断输入的数据是否可以构成合理的哈夫曼树
bool judge()
{
	HuffmanTree T, p;
	char ch1, * codes;
	int length = 0, flag = 1, j, wgh;
	codes = new char[MaxSize];
	T = CreateTree();
	for (int i = 0;i < N;i++)
	{
		cin>>ch1>>codes;
		if (strlen(codes) >= N)//代码长度大于字符总个数
			flag = 0;
		else {
			for (j = 0;ch1 != ch[j];j++);//找到对应的字母
			wgh = w[j];//对应的频率
			p = T;
			for (j = 0;j < strlen(codes);j++)
			{
				if (codes[j] == '0') //建立左子树
				{
					if (!p->left)
						p->left = CreateTree();
					p = p->left;

				}
				else if (codes[j] == '1') //建立右子树
				{
					if (!p->right)
						p->right = CreateTree();
					p = p->right;
				}
				if (p->weight) flag = 0;//此节点已经有权重了，不符合前缀码要求
			}
			if (p->left || p->right)//不是叶子结点
				flag = 0;
			else
				p->weight = wgh;//这个节点给予权重
		}
		length += strlen(codes) * p->weight;//权重进行累加
	}
	if (length != TotalCodes)//累加的权重如果不等于最后的总权重
		flag = 0;
	return flag;
}
int main()
{
	int M;
	HuffmanTree tmp, root;
	cin>>N;
	MinHeap H = CreateHeap();
	for (int i = 0;i < N;i++)
	{
		getchar();
		cin>>ch[i]>>w[i];
		tmp = CreateTree();
		tmp->weight = w[i];
		Insert(H, tmp);
	}
	root = buildTree(H);
	TotalCodes = WPL(root, 0);
	cin>>M;
	for (int i = 0;i < M;i++)
	{
		if (judge())
			cout<<"Yes"<<endl;
		else
			cout<<"No"<<endl;
	}
	return 0;
}

另一种解法就是用另一种方式构建堆，然后再构建哈夫曼树。代码如下：

以后有时间再写把。。。。。。。