博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Huffman Codes
阅读量:6001 次
发布时间:2019-06-20

本文共 5148 字,大约阅读时间需要 17 分钟。

Huffman Codes (30)

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is a string of '0's and '1's.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Sample Input:
7A 1 B 1 C 1 D 3 E 3 F 6 G 64A 00000B 00001C 0001D 001E 01F 10G 11A 01010B 01011C 0100D 011E 10F 11G 00A 000B 001C 010D 011E 100F 101G 110A 00000B 00001C 0001D 001E 00F 10G 11
Sample Output:
YesYesNoNo
#include 
#include
#include
#include
#include
#include
using namespace::std;struct HuffmanCodesNode { bool isLeaf; int frequence; char c; int d; HuffmanCodesNode *Child[2]; HuffmanCodesNode(char ch,int f){ Child[0]=NULL; Child[1]=NULL; c=ch; frequence=f; isLeaf=true; d=0; } HuffmanCodesNode(HuffmanCodesNode *h1=NULL,HuffmanCodesNode *h2=NULL):isLeaf(false),frequence(0),c(0),d(0){ Child[0]=h1; Child[1]=h2; } void clear(){ if(Child[0]!=NULL){Child[0]->clear();delete Child[0];} if(Child[1]!=NULL){Child[1]->clear();delete Child[1];} } int computeWPL(int depth,unsigned int &WPL){ if(Child[0]==NULL && Child[1]==NULL)WPL+=depth*this->frequence; else{ Child[0]->computeWPL(depth+1, WPL); Child[1]->computeWPL(depth+1, WPL); } return 0; } friend bool operator<(HuffmanCodesNode a,HuffmanCodesNode b){ return a.frequence>b.frequence; }};unsigned int computeWPLwithFrequencyMap(map
&f){ priority_queue
q; HuffmanCodesNode *p1,*p2; unsigned int WPL=0; for (map
::iterator iter=f.begin(); iter!=f.end(); ++iter) { HuffmanCodesNode temp(iter->first, iter->second); q.push(temp); } while (q.size()>1) { p1=new HuffmanCodesNode(q.top()); q.pop(); p2=new HuffmanCodesNode(q.top()); q.pop(); HuffmanCodesNode temp(p1,p2); temp.frequence=p1->frequence+p2->frequence; q.push(temp); } p1=new HuffmanCodesNode(q.top()); p1->computeWPL(0, WPL); return WPL;}bool compare(const string& a ,const string& b){ if (a.length() < b.length()) { return true; }else if(a.length() == b.length() && a < b){ return true; } return false;}bool checkPrefix(vector
& s){ sort(s.begin(), s.end(),compare); HuffmanCodesNode *root = new HuffmanCodesNode,*p=root; int t; bool check=false; for (vector
::iterator i=s.begin();i!=s.end();++i){ p=root; for (string::iterator iter = i->begin(); iter!=i->end(); iter++) { t=*iter-'0'; if(p->Child[t]==NULL)p->Child[t]=new HuffmanCodesNode; p=p->Child[t]; if(p->isLeaf){root->clear();return false;} } if(p->Child[0]!=NULL || p->Child[1]!=NULL)check=true; if(check){root->clear();return false;} p->isLeaf = true; } root->clear(); if (!check)return true; else return false;}int main(int argc,const char* argv[]){ ios::sync_with_stdio(false); int N;cin>>N; char c;int frequence; string s; map
f; for (int i=0; i
>c; cin>>frequence; f[c]=frequence; } unsigned int WPL=computeWPLwithFrequencyMap(f); int M;cin>>M; for (int i=0; i
t; for (int j=0; j
>c; cin>>s; t_WPL+=f[c]*s.size(); t.push_back(s); } if ( t_WPL == WPL && checkPrefix(t)){ cout<<"Yes"<

Huffman code的特征有两点

1. WPL(带权路径长度)最小,这个我们可以构建一个Huffman树来计算。

2. 每个编码唯一不具二义性,也就是每个编码都不会是另一个编码的前缀。

我的代码思路是这样

我们首先把输入的权值保存在一个map里面,然后交给

unsigned int computeWPLwithFrequencyMap(map
&f)

来计算出WPL,函数是通过建立一个Huffman树然后遍历得到WPL值的,应该有更好的方法。

之后用得到的WPL来和每个同学的WPL比较,比我们大的肯定不是Huffman编码了。

之后检测是否有二义性有两个办法(我目前想到两个):

  1.按编码长度排序(升序),穷举每个编码是否是后面编码的子串,也就是字符串比较,可以通过kmp比较。

  2.建立trie树,看看每个编码的路径中是否有其他的编码。

我用的是办法2,我先用sort()函数进行了按编码长度的升序排列(偷懒了,也可以不用排序),然后依次建立trie,

把编码最后一位所在的节点进行标记。按照Huffman编码,这个肯定是叶节点,如果以后有编码路径经过它,

那这个肯定不是Huffman编码了。

 

 

 

 

转载于:https://www.cnblogs.com/weierpeng/p/4392536.html

你可能感兴趣的文章
HDU 5366 The mook jong
查看>>
Unity ScriptableObject自定义属性显示
查看>>
【开源】简单4步搞定QQ登录,无需什么代码功底【无语言界限】
查看>>
ORACLE内存管理之ASMM AMM
查看>>
移动前端常用meta标签
查看>>
非结构化数据与结构化数据提取---多线程爬虫案例
查看>>
splay版
查看>>
unity 打包编译记录
查看>>
CSS知识总结(四)
查看>>
软件工程第一次作业
查看>>
22. Generate Parentheses
查看>>
MDL相关总结
查看>>
11.表达式语言
查看>>
3.数据校验和SpringEL
查看>>
面向对象编程-何为对象
查看>>
微信公众平台开发文摘
查看>>
OAF_OAF控件系列1 - Region Type汇总(概念)
查看>>
SPSite, SPWeb Dispose and Class Design Partter
查看>>
alter table添加表约束
查看>>
C# 模拟提交 Form表单的数据
查看>>