Huffman Coding is one of the lossless data compression techniques. It assigns variable-length codes to the input characters, based on the frequencies of their occurence. The most frequent character is given the smallest length code.
I thought of implementing the data compression program. The key things in the implementation were:
Building frequency dictionary
Select 2 minimum frequency symbols and merge them repeatedly: Used Min Heap
Build a tree of the above process: Created a HeapNode class and used objects to maintain tree structure
Assign code to characters: Recursively traversed the tree and assigned the corresponding codes
Encode the input text. Added padding to the encoded text, if it’s not of a length of multiple of 8. Stored this padding information, in 8 bits, in the beginning of the resultant code.
Write the result to an output binary file, which will be our compressed file.
After running on a several sample text files, Compression Ratio on an average was achieved to be 2.1 : 1.
So it’s like you have your very own text file compression program.
I implemented both the compression and decompression functions. Decompressing the compressed file brought back the original state of the file, without any data loss.
Before you look at the code from beginning, first check out the outline of the 2 functions compress (line 101) and decompress (line 157) , and then look at details of the other functions called from them.
Compression (line 101)
Decompression (line 157)
Make frequency dictionary
Merge Nodes and build tree
Pad encodded text
Make byte array
Output the byte array to binary file
Read binary file
Output decoded text to txt file
The class HuffmanCoding takes complete path of the text file to be compressed as parameter. (as its data members store data specific to the input file).
The compress() function returns the path of the output compressed file.
The function decompress() requires path of the file to be decompressed. (and decompress() is to be called from the same object created for compression, so as to get code mapping from its data members)
Running the program:
Save the above code.
Create a sample text file. Or download a sample file from sample.txt (right click, save as)
Save the code below, in the same directory as the above code, and Run this python code (edit the path variable below before running. initialize it to text file path)