Haffman Encoding and Decoding

Read this blog and it seems this is a quite decent interview question. Haffman encoding

Haffman encoding is  a classic way to encode characters. It encodes characters according to building an optimistic binary tree. The binary tree is built according to the frequency. The high frequency characters need to be encoded using short strings. Low frequency characters instead can be encoded to be longer strings.

Suppose we have a string "aaaaaabbbbccddd", [a] = 6, [b] = 4, [c] = 2, [d] = 3. Every time, take the lowest two nodes to built a tree and make the frequency the sum of the two nodes.  the the new node will added into the pool again.

Tree Construction:

1: The current nodes: 6, 4, 2, 3, the smaller two are: 2 and 3, the new root now is 5.

2: The current nodes: 6, 5, 4, the smaller two are 4, 5, the new root is 9.

3: The current nodes: 6, 9, the smaller two are 6, 9, the new root is 15

4: the current nodes : 15, there is only one node left. The tree is complete.

    15

    /      \

        A:6     9

          /     \

  B:4      5

    /     \

      C:2   D:3

To get the haffman encoding, every character's encoding equals to traverse the tree to leave nodes.

To decode the haffman sequence. everytime meets the 0, goes to left, meets the 1, goes to right. After decode one char, goes to the root again.

#include "header.h"
using namespace std;

// suppose we have an array aaaaaabbbbccddd
// first step is to calculate frequencey information and build haffman tree.
struct TreeNode {
  char ch;
  double freq;
  TreeNode* lchild, *rchild;
  TreeNode(char c = 0, double f = 0, TreeNode* l = NULL, TreeNode* r = NULL) : ch(c), freq(f), lchild(l), rchild(r) {}
};

struct cmp{
  bool operator() (TreeNode*& a, TreeNode*& b) {
    return a->freq > b->freq;
  }
};

TreeNode* createTree(string str) {
  unordered_map<char, int> charSet;
  for(int i = 0; i < str.size(); ++i) {
    charSet[str[i]]++;
  }
  priority_queue<TreeNode*, vector<TreeNode*>, cmp> que;
  for(unordered_map<char, int>::iterator p = charSet.begin(); p  != charSet.end(); ++p) {
    que.push(new TreeNode(p->first, (double) p->second / str.size()));
  }
  while(que.size() > 1) {
    TreeNode* l = que.top(); que.pop();
    TreeNode* r = que.top(); que.pop();
    TreeNode* newNode = new TreeNode(0, l->freq + r->freq, l, r);
    que.push(newNode);
  }
  return que.top();
}

// in this function, we can get each character's encode string.
void encodeString(TreeNode* root, string code) {
  if(!root->lchild && !root->rchild) {
    cout << root->ch << ": " << code << endl;
    return;
  }
  if(root->lchild) encodeString(root->lchild, code + '0');
  if(root->rchild) encodeString(root->rchild, code + '1');
}

string decodeString(TreeNode* root, string str) {
  string ret = "";
  TreeNode* p = root;
  for(int i = 0; i < str.size(); ++i) {
    p = ((str[i] == '0') ? p->lchild : p->rchild);
    if(p->lchild == NULL && p->rchild == NULL) {
      ret += p->ch;
      p = root;
    }
  }
  return ret;
}


int main(void) {
  TreeNode* root = createTree("aaaaaabbbbccddd");
  string code = "";
  encodeString(root, code);
  string str = decodeString(root, "010110111");
  cout << str << endl;
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值