print1427. Huffman Code

printHuffman Code

Ограничения: время – 1s/2s, память – 64MiB Ввод: input.txt или стандартный ввод Вывод: output.txt или стандартный вывод copy
Послать решение Blockly Посылки Темы Где Обсудить (0)

We know that characters are represented in binary using fixed-length codes (e.g. 7-bit ASCII or 16-bit UNICODE). Since some characters occur in text more frequently than others, can't we save bits in the long run if we use shorter codes for frequent characters and longer codes for infrequent characters? Yes; however, if one character's code is a prefix of another character's code (e.g. a=010, b=01011, c=11), there's potential ambiguity about where one character ends and the next starts in a string (e.g. whether 01011 encodes the string b or the string ac). If character codes are prefix-free-no code is a prefix of any other code-then no such ambiguity occurs. In 1952, David Huffman discovered an algorithm for producing variable-length prefix-free character codes of minimum average length, based on the frequency of characters in input text.
A Huffman tree is a binary tree whose leaves are labeled by characters, and whose subtrees have weights representing character frequencies. The Huffman code of a character labeling a leaf is obtained by traversing the path from the root to the leaf, writing 0 or 1 when traversing a left or right branch, respectively. To create a Huffman tree from input text, initialize a set containing one-node (i.e. single-leaf) trees for each distinct character in the input text, with weight equal to the frequency of that character in the text. As long as there is more than one tree in the set, repeatedly do the following:
  • Remove the smallest tree `t_1` and next smallest tree `t_2` from the set. Compare trees numerically by weight; if the weights are equal, compare them alphabetically by least character labeling a leaf.
  • Construct a new tree `t` with left subtree `t_1`, right subtree `t_2`, and weight equal to the sum of the weights of `t_1` and `t_2`.
  • Add `t` to the set.
The resulting Huffman tree has a leaf for each distinct character in the input text, from which the characters' Huffman codes can be obtained.
Input Format
The input text consists of a stream of ASCII characters. Only the graphical characters (with ASCII codes between 32 and 126 inclusive) are counted; all others (e.g. tabs, newlines and carriage returns) are ignored.
Output Format
Produce a Huffman code for each graphical ASCII character, based on the frequency they occur in the input text. Output a line containing the character, its Huffman code, and its frequency in parentheses. Sort the lines of output alphabetically by Huffman code.

Sample Input

Mississippi has a number of
repeated letters.

Sample Output

i 000 (4)
l 00100 (1)
m 00101 (1)
n 00110 (1)
o 00111 (1)
  010 (5)
u 01100 (1)
. 011010 (1)
M 011011 (1)
a 0111 (3)
e 100 (6)
p 1010 (3)
r 1011 (3)
s 110 (6)
t 1110 (3)
b 111100 (1)
d 111101 (1)
f 111110 (1)
h 111111 (1)
Source: California State Polytechnic University Programming Contest, Fall 2008
loading