Data Structures, Algorithms, and Encoding
Huffman compression, invented by David A. Huffman in 1952, revolutionized data encoding through its innovative variable-length prefix coding system. Inspired by a challenge from his MIT professor, Huffman developed an algorithm that efficiently allocated shorter codes to more frequently occurring symbols. This groundbreaking technique drastically reduced the average code length for messages, transforming the digital landscape and becoming widely adopted in various applications.
File compression is the process of reducing the size of a file to save disk space and make it easier to transfer. One popular method for file compression is to use character frequency analysis, which takes advantage of the fact that some characters appear more frequently in a file than others. By replacing frequently occurring characters with shorter codes, the overall size of the file can be reduced without losing any of the original information.
Weights in Huffman Compression refer to the frequencies of characters in the input data. The algorithm assigns shorter codes to more frequently occurring characters and longer codes to less frequent ones, resulting in an efficient compression scheme. To utilize a balanced tree, follow these steps:
The following code successfully reads an ASCII file and produces the weights:
void generateWeights(String infName) {
// Open the input file
File inf = fio.getFileHandle(infName);
int status = fio.getFileStatus(inf, true);
// Make sure file is readable
if(readErrorCheck(status)) return;
// reset weights
initWeights();
BufferedReader br = fio.openBufferedReader(inf);
int c = 0;
try {
// Read the input file one character at a time until EOF
while ((c = br.read()) != -1) {
try {
weights[c]++;
} catch (ArrayIndexOutOfBoundsException e) {
// Ignore the character if it is not in the ASCII range
}
}
// Increment the count for the EOF character
weights[0]++;
br.close();
} catch (IOException e) {
e.printStackTrace();
}
outf = fio.getFileHandle(infName + ".csv");
saveWeightsToFile(outf.getName());
}
The following code successfully takes the weights and file and builds a huffman tree
/**
* Builds the huffman tree. Make sure to:
* 1) initialize root to null (cleanup any prior conversions)
* 2) re-initialize the encodeMap
* 3) initialize the queue
* 4) build the tree:
* while the queue is not empty:
* pop the head of the queue into the left HuffmanTreeNode.
* if the queue is empty - set root = left, and return;
* pop the head of the queue into the right HuffmanTreeNode
* create a new non-leaf HuffmanTreeNode whose children are left and right,
* and whose weight is the sum of the weight of the left and right children
* add the new node back to the queue.
*
* It is assumed that the weights have been initialized prior
* to calling this method.
*
* @param minimize - This is just passed to the method to initialize the queue.
*/
void buildHuffmanTree(boolean minimize) {
HuffmanTreeNode left, right;
root = null;
encodeMap = new String[NUM_ASCII];
initializeHuffmanQueue(minimize);
while (!queue.isEmpty()) {
left = queue.poll();
if (queue.isEmpty()) {
root = left;
return;
}
right = queue.poll();
queue.add(new HuffmanTreeNode(left.getWeight() + right.getWeight(), left, right));
}
}
The main objective is to restore the original data from the compressed version. Here's an overview of how Huffman decompression works:
You can browse the project on my Github: