Canonical Huffman encoded data decompression algorithm

Coded data generation or conversion – Digital code to digital code converters – To or from number of pulses

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C341S067000, C341S050000

Reexamination Certificate

active

06657569

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to digital signal processing, and more particularly to digital signal decompression.
BACKGROUND OF THE INVENTION
The advancement of fast computers in recent years has caused a large capacity of data to be treated in the computer. Typically, the data are compressed to efficiently use available memory and reduce data transmission times. A variety of data compression methods are known. One compression method, known as Huffman coding, is a form of statistical coding wherein a probability of each data character is translated to a sequence of bits.
Traditional Huffman coding assigns an optimal prefix-free code for each symbol based on a frequency with which each symbol appears in the file. Symbols that appear more frequently are coded in fewer bits. In standard Huffman coding this is accomplished by building a binary tree structure. This tree structure is then used to build the codewords. Decoding the codeword requires that the tree is traversed until the root is reached. Traversing the tree structure requires that the binary tree be stored in the file, and requires numerous accesses to memory as the tree is traversed.
FIG. 1
illustrates one example of the “code tree” generated by the Huffman encoding. Nodes are points marked with a circle O and a square. A line segment connecting the nodes is called a “branch.” The node located in the highest position is called a “root.” Further, an under node Y connected via the “branch” to a node X is termed a “child” of the node X. Reversely, the node X is referred to as a “parent” of the node Y. A node having no “child” is called a “leaf,” and a particular character corresponds to each “leaf.” Further, the nodes excluding the “leaves” are referred to as “internal nodes,” and the number of “branches” from the “root” down to each “node” is called a level.
When encoded by use of the code tree, a path extending from the “root” down to a target “leaf” (corresponding to a character to be encoded) is outputted as a code. More specifically, “1” is outputted when branching off to the left from each of the nodes from the “root” down to a target “leaf,” while “0” is outputted when branching off to the right. For example, in the code tree illustrated in
FIG. 1
, a code “00” is outputted for a character A corresponding to a “leaf” of a node number
7
, and a code “011” is outputted for a character B corresponding to a “leaf” of a node number
8
.
When decoded, a character is outputted which corresponds to a “leaf” which is reached by tracing the respective nodes from the “root” in accordance with a value of each bit of code defined as a target for decoding.
In general terms, a Huffman encoding algorithm generates the above-described code tree according to the following:
(1) Leaf nodes corresponding to the individual characters are prepared, and the frequency of occurrence of the characters corresponding to the respective leaves are recorded.
(2) One new node is created for two nodes having the minimum occurrence frequency, and this created node is connected via branches to the two nodes. Further, a sum of the occurrence frequencies of the two nodes connected via the branches is recorded as an occurrence frequency of the newly created node.
(3) The processing set forth in item (2) is executed for the remaining nodes, i.e. the nodes not having parents, until the number of remaining nodes becomes 1.
In the code tree thus generated, a code is allocated to each character with a code length which is inversely proportional to the occurrence frequency of the character. Therefore, when the coding is performed by use of the code tree, it follows that the data can be compressed.
This data must be decompressed before it is used.
The Huffman encoding algorithm was introduced in the early 1950's. In subsequent years it has been characterized, analyzed and expanded to include canonical forms. A canonical Huffman code is an optimal prefix-free code, but the code is not generated using the standard Huffman coding. Instead, Huffman's algorithm is used to calculate the length in bits of each codeword, but a different algorithm is used to determine the actual codewords. The codewords are chosen such that a small look-up table is all that is needed to determine when a codeword has been assembled, then the decoded symbol can be accessed directly from another small look-up table containing the decoded symbols.
As shown in
FIG. 2
, a canonical Huffman encode file consists of three tables: FIRSTCODE table
1
, SYMBOL POINTER table
2
, and SYMBOL table
3
; and the compressed data. The FIRSTCODE table
1
has one entry assigned to it for each codeword of a given length. For example if the largest codeword length is 10 bits then the FIRSTCODE table
1
would have 10 table entries. The FIRSTCODE table
1
is organized with the code for length
1
first, down to the code for length N. The values in this table are used to determine when a codeword has been built for decoding. Since the codewords are of varying lengths and no delimiter is used, the codeword bit patterns and the FIRSTCODE table
1
allow for determining when the end of a codeword has been reached. The value in the FIRSTCODE table
1
is equal to the minimum integer value of codewords of that bit length. That is, when assembling the bits when the integer value of the assembled bits is greater than or equal to the value in FIRSTCODE, then a valid codeword is available to decode.
The second table, SYMBOL POINTER table
2
, contains a pointer to the first decoded symbol for each bit length. The SYMBOL POINTER table
2
contains one word for each bit length.
The SYMBOL table
3
contains the actual decoded symbol value. The length of the SYMBOL table
3
is equal to the number of symbols to decode. The SYMBOL table
3
is a look-up table of the resolved symbols indexed by address.
Each of these three tables is stored in memory. The compressed data are read and the decompression algorithm is used to extract the original data.
FIG. 2
refers to traditional canonical decoding algorithms wherein a LENGTH COUNTER
4
and working value V are reset to 0. A next bit of compressed data is added to V using an addition function
5
. Using a comparison function
6
, the result of the add function
5
is compared to a FIRSTCODE value selected by the LENGTH COUNTER
4
. If the comparison (V>=FIRSTCODE(L)) is FALSE, then the LENGTH COUNTER
4
is incremented; the working value is multiplied by 2 using a left shift operation
7
; and the next compressed bit from the compressed data
8
is added to the shifted value. Using the comparison function
6
, the resultant is again compared with FIRSTCODE(L). This process is repeated until the comparison (V>=FIRSTCODE(L)) is TRUE.
When the comparison is TRUE, a value from the SYMBOL POINTER table
2
is selected by the LENGTH COUNTER and added to the working value V using an addition function
9
. The FIRSTCODE value selected by the LENGTH COUNTER
4
is subtracted using a subtraction function
10
, and the resultant value is the address for the SYMBOL Table
3
, whereby the SYMBOL is resolved. The algorithm is thus essentially a bit comparison process that can take up to N iterations to resolve the SYMBOL.
The decompression, or “decoding,” procedure may be described by the C-language pseudo code in Table 1 below. In the pseudo code, the function performs a traditional canonical Huffman decode algorithm as described above and returns the resolved SYMBOL.
TABLE 1
Traditional Canonical Decompression Algorithm Pseudo Code
Initialize V and the LENGTH COUNTER value L to their base values
Set V to NEXTINPUTBIT() value
While V < FIRSTCODE(L)
  Set V to V * 2 + NEXTINPUTBIT()
  L = L + 1
Return SYMBOL = SYMBOL POINTER TABLE (L) + V −
FIRSTCODE(L)
Although effective, this software decompression is time consuming and prevents the software from performing other tasks during decompression.
SUMMARY OF THE INVENTION
The present invention provides a fast means for decompre

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Canonical Huffman encoded data decompression algorithm does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Canonical Huffman encoded data decompression algorithm, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Canonical Huffman encoded data decompression algorithm will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3177633

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.