04-树6. Huffman Codes

解析Huffman编码与学生提交代码验证算法

最新推荐文章于 2020-05-06 20:12:48 发布

原创最新推荐文章于 2020-05-06 20:12:48 发布 · 718 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#ACM #PAT #Huffman编码

ACM&PAT 同时被 2 个专栏收录

39 篇文章

订阅专栏

数据结构

9 篇文章

订阅专栏

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is a string of '0's and '1's.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No

思路：

判断哈夫曼编码的条件有两个：

1 哈夫曼编码不唯一，但它的WPL(带权路径长度)一定唯一

2 短码不能是长码的前缀

首先可以使用STL优先队列根据 WPL=所有非叶节点的权值之和求出标准的WPL1

再根据WPL2=所有叶节点的高度*权值之和

再单独判断是否编码中构成前缀

两个条件都满足则输出Yes

代码：

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Scanner;
import java.util.TreeSet;









class Tree
{
	private Node root;
	
	
	
	public Node getRoot() {
		return root;
	}

	public void setRoot(Node root) {
		this.root = root;
	}

	static class Node implements Comparable<Node>
	{
		private char c=' ';//字母
		private int f =0;// 出现的频率
		private Node parent ;//父节点
		private Node leftNode ;//左子节点
		private Node rightNode ;//右子节点

		@Override
		public int compareTo(Node o) {
			// TODO Auto-generated method stub
			return f-o.f;
		}

		public char getC() {
			return c;
		}

		public void setC(char c) {
			this.c = c;
		}

		public int getF() {
			return f;
		}

		public void setF(int f) {
			this.f = f;
		}

		public Node getParent() {
			return parent;
		}

		public void setParent(Node parent) {
			this.parent = parent;
		}

		public Node getLeftNode() {
			return leftNode;
		}

		public void setLeftNode(Node leftNode) {
			this.leftNode = leftNode;
		}

		public Node getRightNode() {
			return rightNode;
		}

		public void setRightNode(Node rightNode) {
			this.rightNode = rightNode;
		}

		@Override
		public String toString() {
			return "Node [c=" + c + ", f=" + f + "]";
		}
			
	}
	
	/*构造huffman树
	 * 返回huffman的带全路径长度＝所有非叶子节点之和
	 * */
	public static int encode(PriorityQueue<Node> queue)
	{
		int WPL=0;
		//取出queue中，优先级最高的两个Node
		int num= queue.size()-1;
		for(int i=0;i<num;i++)
		{
			//两个node的优先级相加，最为一个新的Node加入到queue中
			Node left= queue.poll();
			Node right = queue.poll();
			Node newNode = new Node();
			newNode.setF(left.f+right.f);
			WPL+=left.f+right.f;
			queue.add(newNode);
			//将两个node节点分别作为新节点的左右节点
			newNode.leftNode=left;
			newNode.rightNode=right;
			//设置新节点为左右节点的父节点
			left.parent=newNode;
			right.parent=newNode;
		}
		//queue中的最后一个节点就是根节点
		return WPL;
	}
	//宽度优先遍历树
	public void printTreeBFS()
	{
		LinkedList<Node> queue = new LinkedList<Node>();
		if(root!=null)
		{
			queue.add(root);
			while(!queue.isEmpty())
			{
				Node node = queue.poll();
				System.out.print(node+":");
				if(node.leftNode!=null)
					queue.add(node.leftNode);
				if(node.rightNode!=null)
					queue.add(node.rightNode);
			}
		}

	}
	
	public static boolean jude(HashMap<Character,String> hashMap)
	{
		//对hashMap进行排序
		List<Map.Entry<Character, String>> infoIds =
			    new ArrayList<Map.Entry<Character, String>>(hashMap.entrySet());
		Collections.sort(infoIds, new Comparator<Map.Entry<Character, String>>() {   
		    public int compare(Map.Entry<Character, String> o1, Map.Entry<Character, String> o2) {      
		        //return (o2.getValue() - o1.getValue()); 
		        return  o1.getValue().length()-o2.getValue().length();
		    }
		}); 
		for (int i = 0; i < infoIds.size(); i++) {
		    String code = infoIds.get(i).getValue();
		    for(int j=i+1;j<infoIds.size();j++)
		    {
		    	String nextcode = infoIds.get(j).getValue();
		    	if(nextcode.startsWith(code))
		    	{
		    		return false;
		    	}
		    }
		}
		
		return true;
	}
	
}


public class Main{
	
	public static void main(String[] args){
	
		Scanner scanner = new Scanner(System.in);
		//使用PriorityQueue保存节点
		PriorityQueue<Tree.Node> queue = new PriorityQueue<Tree.Node>();
		//输入一个整数N
		int n= scanner.nextInt();
		//保存字母 i出现的频率
		int[] help = new int[130];
		for(int i=0 ;i<n ;i++)
		{
			char c= scanner.next().charAt(0);
			int f= scanner.nextInt();
			help[c]=f;
			Tree.Node node = new Tree.Node();
			node.setC(c);
			node.setF(f);
			queue.add(node);
		}
		//构造Huffman树
		int WPL = Tree.encode(queue);
		//输入m
		int M= scanner.nextInt();
		for(int i=0 ;i<M;++i)
		{
			//一组输入的WPL
			int WPL2=0;
			//保存一组输入
			HashMap<Character,String> hashMap = new HashMap<Character,String>();
			for(int j=0;j<n;++j)
			{
				char c = scanner.next().charAt(0);
				String code = scanner.next();
				WPL2 +=help[c]*code.length(); 
				hashMap.put(c, code);
			}
			if(WPL==WPL2)
			{
				//判断是否存在短编码是长编码的前缀
				if(Tree.jude(hashMap))
					System.out.println("Yes");
				else
					System.out.println("No");
			}
			else 
			{
				System.out.println("No");
			}
				
		}

	}
	

}
  

结果正确 但是有一个测试点会超时，c++版本

参考资料：http://blog.youkuaiyun.com/AXuan_K/article/details/45583335

http://shmilyaw-hotmail-com.iteye.com/blog/2009929