这是学校的数据结构与算法大作业
题目如下:
哈夫曼树
请为下面这段英文文本构造哈夫曼编码:
“Effificient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectifified Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We fifirst systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplififies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation”
要求如下:
1) 请计算出每个字符出现的概率,并以概率为权重来构造哈夫曼树,写出构造过程、画出最终的哈夫曼树,得到每个字符的哈夫曼编码。
解:通过以下代码得到每个字母出现的概率,并从小到大排序:
include <iostream>
#include <string>
using namespace std;
//哈夫曼树结构体
typedef struct HFnode {
char data;
float weight;//哈夫曼树的权重->概率
HFnode *lchild;
HFnode *rchild;
} *HFtree, HFnode;
//链队列(单向循环链表)->完全就不需要循环链表,故改为单链表
typedef struct QueueNode {
HFnode data;
QueueNode *next;
} QueueNode, *QueueNodePtr;
typedef struct LinkQueue {
// 链队有头节点,头节点不存放任何数据
// 队空条件:rear == front,不设队满条件
QueueNode *front;
QueueNode *rear;
} LinkQueue;
bool initLinkQueue(LinkQueue &Q) {
QueueNode *p = new QueueNode;
p->next = NULL;
Q.rear = Q.front = p;
};
//入队
bool EnQueue(LinkQueue &Q, HFnode e) {
// 队尾入,队头出
QueueNode *p = new QueueNode;
p->data = e;
p->next = NULL;
Q.rear->next = p;
Q.rear = Q.rear->next;
return true;
}
bool DeQueue(LinkQueue &Q, HFnode &e) {
// 队尾入,队头出,出队先判断是否队空
if (Q.front == Q.rear)
return false;
QueueNode *p = Q.front->next;
e = p->data;
Q.front->next = Q.front->next->next;
if (Q.front->next == NULL) {
Q.rear = Q.front;//如果最后一个元素出队,则令两个指针都指向头节点
}
delete p;
return true;
}
int getFreq(char c, const string &str);
HFnode merge(HFnode HFnode1, HFnode HFnode2);
double probabilities[128]; //此数组用于存放下标作为ascii码对应的字符出现的概率
bool flags[128]; //标记data内数据是否已被使用:true-已使用,false-未使用,需要初始化!
void test1_PrintData();
void initFlags();
int getMin();
void PreOrder(HFtree root, string code);
int main() {
string str = "Effificient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectifified Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We fifirst systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplififies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation";
double len = str.length();//待统计字符串长度,用浮点数表示以避免整除
//开始统计字符串中每个字符出现的概率
for (int i = 0; i < 128; i++) {
probabilities[i] = getFreq(char(i), str) / len;
}
// test1_PrintData();//检查点1:打印每个字符出现的概率,并检查它们的概率之和是否为1
//开始构造哈夫曼树
//初始化flags
initFlags();
//构造队列
LinkQueue Q;
initLinkQueue(Q);
while (true) { //找到出现概率最小且概率不为0的字符
int index = getMin();
if (index == -1) {
break;
}
//构造树节点
HFnode *nodeptr = new HFnode;
nodeptr->data = char(index);
nodeptr->weight = probabilities[index];
nodeptr->rchild = nodeptr->lchild = NULL;
//入队
EnQueue(Q, *nodeptr);
}
HFnode e;
while(DeQueue(Q,e)){
cout<<e.data<<": "<<e.weight<<endl;
}
}
///此函数用于统计字符c在字符串str中出现的频数
int getFreq(char c, const string &str) {
int freq = 0;
for (char i : str) {
if (i == c)
freq++;
}
return freq;
}
///初始化flags--将flags中内容全部置为false
void initFlags() {
for (bool &flag : flags) {
flag = false;
}
}
///找到数组data中最小的元素并返回其下标,若返回-1,代表已经选择完毕
int getMin() {
int index = -1;
for (int i = 0; i < 128; i++) {
if (probabilities[i] > 1e-9 && !flags[i]) {
index = i;
break;
}
}
if (index == -1)
return index;//可用元素已经选择完毕
for (int i = 0; i < 128; i++) {
if (probabilities[i] < probabilities[index] && probabilities[i] > 1e-9 && !flags[i])
index = i;
}
flags[index] = true;
return index;
}
运行后输出:
C:\Users\1\Desktop\DS\cmake-build-debug\DS.exe
2: 0.00138122
B: 0.00138122
E: 0.00138122
M: 0.00138122
(: 0.00276243
): 0.00276243
1: 0.00276243
C: 0.00276243
R: 0.00276243
T: 0.00276243
L: 0.00414365
W: 0.00414365
b: 0.00414365
v: 0.00414365
N: 0.00552486
k: 0.00552486
-: 0.00690608
.: 0.00690608
,: 0.00828729
w: 0.0110497
g: 0.0138122
p: 0.0138122
y: 0.0179558
u: 0.019337
h: 0.0207182
d: 0.0220994
c: 0.0290055
m: 0.0303867
f: 0.0372928
r: 0.0428177
l: 0.0497238
n: 0.0566298
o: 0.0566298
t: 0.0662983
a: 0.0718232
s: 0.0732044
e: 0.0745856
i: 0.0773481
: 0.143646
Process finished with exit code 0
构造哈夫曼树时,应该:
- 根据给定的n个权值{w1,w2,w3,…,wn}构成n棵二叉树的集合F={T1,T2,T3,…,Tn},其中每棵二叉树Ti中只有一个带权为wi的根结点,其左右子树均为空.
- 在集合F中选取两棵根结点权值最小的树作为左右子树构造一棵新的二叉树,新二叉树的根结点的权值为其左右子树上根结点的权值之和.
- 在集合F中删除这两棵树,同时将新得到的二叉树加入F中.
- 重复步骤(2)、(3),直到F中只含一棵树为止,这棵树就是一棵哈夫曼树.
根据以上规则,可构成哈夫曼树如下:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-28BHoFXA-1624255305873)({PTA_URL}/api/private-image?p=user-uploads/1231835569046310912/2021-6-21/1624247479-7b81fbf7-025f-4f8f-9a5f-5708fd7088b3.png)]
’ ': 100
‘n’: 0000
‘o’: 0010
‘r’: 0101
‘l’: 0111
‘t’: 1010
‘a’: 1011
‘s’:1100
‘e’: 1110
‘i’: 1111
‘c’: 00011
‘m’: 00110
‘h’: 01001
‘d’: 01100
‘f’:11011
‘p’: 000101
‘w’:011010
‘g’:011011
‘y’: 110100
‘u’: 110101
‘-’: 0001000
‘.’: 0001001
‘,’: 0011101
‘b’: 0100000
‘v’: 0100001
‘N’: 0100010
‘k’: 0100011
‘L’: 00111110
‘W’: 00011111
‘(’: 001110000
‘)’: 001110001
‘1’: 001110010
‘C’: 001110011
‘R’: 001111000
‘T’: 001111001
‘E’: 0011110100
‘B’: 0011110101
‘M’: 0011110110
‘2’: 0011110111
2) 请将上述设计哈夫曼编码的过程,用代码来实现,并输出各个字母的哈夫曼编码。(有代码,有运行结果的截图)
源码如下:
#include <iostream>
#include <string>
using namespace std;
//哈夫曼树结构体
typedef struct HFnode {
char data;
float weight;//哈夫曼树的权重->概率
HFnode *lchild;
HFnode *rchild;
} *HFtree, HFnode;
//链队列(单向循环链表)->完全就不需要循环链表,故改为单链表
typedef struct QueueNode {
HFnode data;
QueueNode *next;
} QueueNode, *QueueNodePtr;
typedef struct LinkQueue {
// 链队有头节点,头节点不存放任何数据
// 队空条件:rear == front,不设队满条件
QueueNode *front;
QueueNode *rear;
} LinkQueue;
bool initLinkQueue(LinkQueue &Q) {
QueueNode *p = new QueueNode;
p->next = NULL;
Q.rear = Q.front = p;
};
//入队
bool EnQueue(LinkQueue &Q, HFnode e) {
// 队尾入,队头出
QueueNode *p = new QueueNode;
p->data = e;
p->next = NULL;
Q.rear->next = p;
Q.rear = Q.rear->next;
return true;
}
bool DeQueue(LinkQueue &Q, HFnode &e) {
// 队尾入,队头出,出队先判断是否队空
if (Q.front == Q.rear)
return false;
QueueNode *p = Q.front->next;
e = p->data;
Q.front->next = Q.front->next->next;
if (Q.front->next == NULL) {
Q.rear = Q.front;//如果最后一个元素出队,则令两个指针都指向头节点
}
delete p;
return true;
}
bool JumpQueue(LinkQueue &Q, HFnode &e) {
/*设置两个指针p、q,当找到适合插入的位置时,p指向大于等于被插入结点权重的结点或者空,
* q指向小于被插入结点的权重的结点。
* **/
QueueNodePtr p = Q.front, q = Q.front;
QueueNodePtr newQNode = new QueueNode;
newQNode->data = e;
//如果要插入的位置是第一位,则插入
if (p->data.weight >= e.weight) {
newQNode->next = Q.front;
Q.front = newQNode;
return true;
}
//如果插入的位置不是第一位,则p后移
if (p->data.weight < e.weight) {
p = p->next;
while (p != NULL && p->data.weight < e.weight) {//p!=NULL需要放在前方,否则会引起段错误
p = p->next;
q = q->next;
}
newQNode->next = p;
q->next = newQNode;
return true;
}
}
int getFreq(char c, const string &str);
HFnode merge(HFnode HFnode1, HFnode HFnode2);
double probabilities[128]; //此数组用于存放下标作为ascii码对应的字符出现的概率
bool flags[128]; //标记data内数据是否已被使用:true-已使用,false-未使用,需要初始化!
void test1_PrintData();
void initFlags();
int getMin();
void PreOrder(HFtree root, string code);
int main() {
string str = "Effificient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectifified Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We fifirst systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplififies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation";
double len = str.length();//待统计字符串长度,用浮点数表示以避免整除
//开始统计字符串中每个字符出现的概率
for (int i = 0; i < 128; i++) {
probabilities[i] = getFreq(char(i), str) / len;
}
// test1_PrintData();//检查点1:打印每个字符出现的概率,并检查它们的概率之和是否为1
//开始构造哈夫曼树
//初始化flags
initFlags();
//构造队列
LinkQueue Q;
initLinkQueue(Q);
while (true) { //找到出现概率最小且概率不为0的字符
int index = getMin();
if (index == -1) {
break;
}
//构造树节点
HFnode *nodeptr = new HFnode;
nodeptr->data = char(index);
nodeptr->weight = probabilities[index];
nodeptr->rchild = nodeptr->lchild = NULL;
//入队
EnQueue(Q, *nodeptr);
}
HFnode HFnode1, HFnode2, newHFnode;
HFtree root;
while (DeQueue(Q, HFnode1) && DeQueue(Q, HFnode2)) {
newHFnode = merge(HFnode1, HFnode2);
if (abs(newHFnode.weight - 1) < 1e-9) {
root = &newHFnode;
break;
}
JumpQueue(Q, newHFnode);
}
//已构建以root为根节点的哈夫曼树,开始遍历求编码
PreOrder(root, "");
printf("finish!");
}
///此函数用于统计字符c在字符串str中出现的频数
int getFreq(char c, const string &str) {
int freq = 0;
for (char i : str) {
if (i == c)
freq++;
}
return freq;
}
///初始化flags--将flags中内容全部置为false
void initFlags() {
for (bool &flag : flags) {
flag = false;
}
}
///找到数组data中最小的元素并返回其下标,若返回-1,代表已经选择完毕
int getMin() {
int index = -1;
for (int i = 0; i < 128; i++) {
if (probabilities[i] > 1e-9 && !flags[i]) {
index = i;
break;
}
}
if (index == -1)
return index;//可用元素已经选择完毕
for (int i = 0; i < 128; i++) {
if (probabilities[i] < probabilities[index] && probabilities[i] > 1e-9 && !flags[i])
index = i;
}
flags[index] = true;
return index;
}
///将两个结点合并为一个
HFnode merge(HFnode HFnode1, HFnode HFnode2) {
HFnode *newNode = new HFnode;
HFnode *node1 = new HFnode;
HFnode *node2 = new HFnode;
node1->data = HFnode1.data;
node1->lchild = HFnode1.lchild;
node1->rchild = HFnode1.rchild;
node1->weight = HFnode1.weight;
node2->data = HFnode2.data;
node2->lchild = HFnode2.lchild;
node2->rchild = HFnode2.rchild;
node2->weight = HFnode2.weight;
newNode->weight = HFnode1.weight + HFnode2.weight;
newNode->lchild = node1;
newNode->rchild = node2;
return *newNode;
}
///二叉树的先序遍历,求得编码,左1右0。
void PreOrder(HFtree root, string code) {
if (root->lchild == NULL && root->rchild == NULL) {
cout<<root->data<<": "+code<<endl;
}
if (root->lchild != NULL)
PreOrder(root->lchild, code + "1");
if (root->rchild != NULL)
PreOrder(root->rchild, code + "0");
}
///检查点1:打印每个字符出现的概率,并检查它们的概率之和是否为1
void test1_PrintData() {
double sum = 0;
for (int i = 0; i < 128; i++) {
if (probabilities[i] != 0) {
cout << char(i) << ": " << probabilities[i] << endl;
sum += probabilities[i];
}
}
if (abs(sum - 1) < 0.00001)
printf("Checkpoint 1: Pass!");
else
printf("Checkpoint 1: Fail!");
}
运行结果:
C:\Users\1\Desktop\DS\cmake-build-debug\DS.exe
r: 1111
h: 11101
(: 11100111
): 11100110
E: 111001011
M: 111001010
2: 111001001
B: 111001000
R: 11100011
T: 11100010
1: 11100001
C: 11100000
d: 11011
w: 110101
-: 1101001
.: 1101000
l: 1100
n: 1011
o: 1010
g: 100111
p: 100110
c: 10010
m: 10001
b: 10000111
v: 10000110
L: 10000101
W: 10000100
y: 100000
t: 0111
a: 0110
: 010
s: 0011
e: 0010
f: 00011
,: 0001011
N: 00010101
k: 00010100
u: 000100
i: 0000
finish!
Process finished with exit code 0
3) 请分析算法的效率,至少包括时间复杂度和空间复杂度等。
①时间复杂度
从主函数入口分析:
int main() {
string str = "Effificient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectifified Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We fifirst systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplififies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation";
double len = str.length();//待统计字符串长度,用浮点数表示以避免整除
//开始统计字符串中每个字符出现的概率
for (int i = 0; i < 128; i++) {
probabilities[i] = getFreq(char(i), str) / len;
}
// test1_PrintData();//检查点1:打印每个字符出现的概率,并检查它们的概率之和是否为1
//开始构造哈夫曼树
//初始化flags
initFlags();
//构造队列
LinkQueue Q;
initLinkQueue(Q);
while (true) { //找到出现概率最小且概率不为0的字符
int index = getMin();
if (index == -1) {
break;
}
//构造树节点
HFnode *nodeptr = new HFnode;
nodeptr->data = char(index);
nodeptr->weight = probabilities[index];
nodeptr->rchild = nodeptr->lchild = NULL;
//入队
EnQueue(Q, *nodeptr);
}
HFnode HFnode1, HFnode2, newHFnode;
HFtree root;
while (DeQueue(Q, HFnode1) && DeQueue(Q, HFnode2)) {
newHFnode = merge(HFnode1, HFnode2);
if (abs(newHFnode.weight - 1) < 1e-9) {
root = &newHFnode;
break;
}
JumpQueue(Q, newHFnode);
}
//已构建以root为根节点的哈夫曼树,开始遍历求编码
PreOrder(root, "");
printf("finish!");
}
其中有两个循环,是时间复杂度的主成分。在第一个循环内,是将文本中出现的字符按照它们出现的概率从小到大依次入队。其中getMin()函数是找到文本中存在且出现概率的字符。其代码块如下:
int getMin() {
int index = -1;
for (int i = 0; i < 128; i++) {
if (probabilities[i] > 1e-9 && !flags[i]) {
index = i;
break;
}
}
if (index == -1)
return index;//可用元素已经选择完毕
for (int i = 0; i < 128; i++) {
if (probabilities[i] < probabilities[index] && probabilities[i] > 1e-9 && !flags[i])
index = i;
}
flags[index] = true;
return index;
}
内部有两个循环,设文本长度为n,有getMin()函数时间复杂度为O(n)O(n)O(n),假设文本中每个出现的字符都不重复,有且仅有1个,则调用它的循环代码块的最坏时间复杂度为O(n2)O(n^2)O(n2)。
下面,我们来分析第二个循环代码块:
while (DeQueue(Q, HFnode1) && DeQueue(Q, HFnode2)) {
newHFnode = merge(HFnode1, HFnode2);
if (abs(newHFnode.weight - 1) < 1e-9) {
root = &newHFnode;
break;
}
JumpQueue(Q, newHFnode);
}
//已构建以root为根节点的哈夫曼树,开始遍历求编码
PreOrder(root, "");
printf("finish!");
}
其调用的merge()函数如下:
///将两个结点合并为一个
HFnode merge(HFnode HFnode1, HFnode HFnode2) {
HFnode *newNode = new HFnode;
HFnode *node1 = new HFnode;
HFnode *node2 = new HFnode;
node1->data = HFnode1.data;
node1->lchild = HFnode1.lchild;
node1->rchild = HFnode1.rchild;
node1->weight = HFnode1.weight;
node2->data = HFnode2.data;
node2->lchild = HFnode2.lchild;
node2->rchild = HFnode2.rchild;
node2->weight = HFnode2.weight;
newNode->weight = HFnode1.weight + HFnode2.weight;
newNode->lchild = node1;
newNode->rchild = node2;
return *newNode;
}
显然,其时间复杂度为O(1)O(1)O(1)。 同样,假设文本中每个出现的字符都不重复,有且仅有1个,这样,此循环代码块的最坏时间复杂度为O(n)O(n)O(n)。
在主函数中,还出现了线序遍历函数PreOrder(),其代码如下:
///二叉树的先序遍历,求得编码,左1右0。
void PreOrder(HFtree root, string code) {
if (root->lchild == NULL && root->rchild == NULL) {
cout<<root->data<<": "+code<<endl;
}
if (root->lchild != NULL)
PreOrder(root->lchild, code + "1");
if (root->rchild != NULL)
PreOrder(root->rchild, code + "0");
}
依照之前的假设,其最坏时间复杂度为O(n)O(n)O(n)。
综上所述,此算法的时间复杂度为O(n2)O(n^2)O(n2)
②空间复杂度
在此算法中,用到的辅助空间有:
数组:
double probabilities[128]; //此数组用于存放下标作为ascii码对应的字符出现的概率
bool flags[128]; //标记data内数据是否已被使用:true-已使用,false-未使用,需要初始化!
数组类型的空间复杂度为O(1)O(1)O(1).
队列:
//哈夫曼树结构体
typedef struct HFnode {
char data;
float weight;//哈夫曼树的权重->概率
HFnode *lchild;
HFnode *rchild;
} *HFtree, HFnode;
//链队列(单向循环链表)->完全就不需要循环链表,故改为单链表
typedef struct QueueNode {
HFnode data;
QueueNode *next;
} QueueNode, *QueueNodePtr;
typedef struct LinkQueue {
// 链队有头节点,头节点不存放任何数据
// 队空条件:rear == front,不设队满条件
QueueNode *front;
QueueNode *rear;
} LinkQueue;
队列用于存放出现字符以及其相关信息。最大队列长度为ASCII码的个数,为128。故其空间复杂度也为O(1)O(1)O(1)。
综上所述,此算法的空间复杂度为O(1)O(1)O(1)