Coded data generation or conversion – Digital code to digital code converters – To or from number of pulses
Reexamination Certificate
2001-10-19
2003-05-13
Young, Brian (Department: 2819)
Coded data generation or conversion
Digital code to digital code converters
To or from number of pulses
Reexamination Certificate
active
06563440
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to data compression, and specifically to decoding Huffman-encoded code words.
2. Description of the Related Art
Huffman codes are very widely used in the area of data compression and telecommunication. Some applications include JPEG picture compression and MPEG video and audio compression. Huffman codes are of variable word length, which means that the individual symbols used to compose a message arc represented (encoded) each by a distinct bit sequence of distinct length. This characteristic of the code words helps to decrease the amount of redundancy in message data, i.e., it makes data compression possible. For example, symbols A, B, C and D may be represented with following code words:
TABLE 1
An example of Huffman code table
Symbol
Code word
A
0
B
10
C
110
D
110
All code words are uniquely decodable; for example, the sequence of bits “01101110100” decodes to “ACDABA”. The set of code words is called a symbol list or alphabet. Uniqueness follows From the “prefix property” of Huffman codes; that is, the fact that if and when any leftmost or “leading” substring of a code word matches a code word in a Huffman decoding table, there is no need to check any additional bits beyond the leading substring. For example, the symbol “B” is assigned a code word of “10”. Thus, no other code words begin with “10”.
The use of Huffman codes affords compression, because distinct symbols have distinct probabilities of incidence. This property is used to advantage by tailoring the code lengths corresponding to those symbols in accordance with their respective probabilities of occurrence. Symbols with higher probabilities of incidence are coded with shorter code words, while symbols with lower probabilities are coded with longer code words. Longer code words still show up, but because of their smaller probabilities of occurrence, the overall code length of all code words in a typical bit string tends to be smaller due to the Huffman coding.
The algorithm for building Huffman code is based on a “coding tree”. Commonly-known algorithm steps are:
1. Line up the symbols by decreasing probabilities.
2. Link two symbols with least probabilities into one new symbol which probability is a sum of probabilities of two symbols.
3. Iterate step two until there is only one symbol left that has probability of unity.
4. Trace the coding tree from a root (the generated symbol with probability 1.0) to origin symbols, and assign to each lower branch
1
, and to each upper branch
0
, or vice versa.
For example, probabilities for some letters are listed in Table 2, and one of the possible Huffman trees built by applying the above algorithm to these probabilities is shown in FIG.
1
.
TABLE 2
Example of a probability distribution in a subset of an alphabet
E
2/25
X
1/25
A
2/25
M
2/25
P
1/25
L
1/25
O
2/25
F
2/25
H
1/25
U
1/25
C
1/25
D
1/25
I
1/25
N
2/25
G
1/25
space
3/25
Each “0” bit in a code word corresponds to traversing a “0” branch in the tree, which, in
FIG. 1
, is done by going up; going down traverses a “1” branch. The code word “11000” is represented on the tree by, starting on the right, at the root, and traversing one-by-one, a branch for each bit of the code word. The first two bits, “11”, correspond to the two one branches, or two down steps. The next bit, “0”, corresponds to movement up, i.e. along a zero branch, as shown by the arrow. Traversing two more zero branches, for the remaining bits, “00”, leads to the output symbol for the complete code word “11000”, which is here the letter “P”, located on the left side of FIG.
1
.
It is thus seen from
FIG. 1
that, for example, the code for letter “P” is “11000” and that there are several possible Huffman tables for any given probability distribution.
A basic difficulty in decoding Huffman codes is that the decoder cannot know a prior what is the length of an incoming code word.
Huffman codes can be detected extremely fast by dedicating enormous amounts of memory. For a set of Huffman code words whose maximum word length of N bits, 2
N
memory locations are needed, because N incoming bits are used as an address into the lookup table to find the corresponding code words. For example, the decoding symbols of Table 1 would require 2
3
=8 memory locations. All addresses that begin with “0” are used to store the symbol “A”, all addresses starting with “10“store the symbol “B” and so forth. When a code word is applied to the lookup table, decoding of the slice is performed instantly. Then, the incoming bit stream is shifted by the bit length of the code word just decoded, to bring the following code word into operable decoding position. For codes that have, for example, a maximum length of 19 bits, memory requirements grow very large.
A technique requiring less memory is bit-by-bit decoding, which proceeds as follows. One bit is taken and compared to all the possible codes with a word length of one. If a match is not found, another bit is shifted in to try to find the bit pair from among all the code words with a word length of two. This is continued until a match is found. Although this approach is very memory-efficient, it is also very slow, especially if the code word being decoded is long.
Another solution uses content-addressable memories (CAMs). A bit slice (i.e., bit string long enough to accommodate any code word and therefore equal in length to the maximum code word) is applied to the input of a CAM containing, all code words as “addresses” and memory pointers as “contents”. The CAM contains memory pointers that reference symbols and associated code word lengths in a RAM table. Once a code word is decoded, the incoming bit stream is then shifted by the length of the decoded code word, and decoding resumes. An efficiently-implemented CAM scheme is fast, but still requires extra memory for pointers. Moreover, CAMs are not readily available in all technologies. The CAM-based approach is described in U.S. Pat. No. 5,208,593 which is further discussed below.
As indicated in the above examples, a problem in using variable code word lengths is achieving balance between speed and reasonable memory usage.
Canonical Huffman codes are of special interest since they make decoding easier. PKZip (file compression/decompression utility), MPEG-1 layer III (Mp3) and the JPEG default baseline encoder all use canonical Huffman tables. Applications can also be found in other areas of multimedia and telecommunication.
Characteristic of canonical Huffman codes is that the most significant (n−1) bits of the smallest Huffman code of length n are greater in value than the largest Huffman code of length (n−1), provided that the table is of the type where almost all codes have a leading one bit. For a Huffman table composed predominantly of codes whose leading bit is zero, that is, a table derived, for example, by reversing all code word bits, a converse rule applies: The most significant (n−1) bits of the largest Huffman code of length n are smaller in value than the smallest Huffman code of length (n−1). Transforming Huffman tables to canonical format does not decrease coding efficiency, because, as can be seen from the following example in Table 3, the transformation does not change the number of bits per code word.
TABLE 3
Normal versus canonical code words
tree code
code2
code3
canonical
A
001
111
100
010
D
0000
1101
0011
0000
C
1001
0010
1011
0001
D
101
000
000
011
E
01
01
11
10
F
0001
1100
1010
0010
G
1000
0011
0010
0011
H
11
10
01
11
In accordance with the above converse rule for canonical codes, codes of length 3 (for example, 010 and 011) are always larger than the three starting bits of codes of length 4 (for example, 0000,0001,0010,0011). Code lengths are otherwise left unchanged.
Also noteworthy is that canonical codes often start with a string of ones (or zeroes) due to the above characteristic. The property of starting with one strings has been used in U.S. Pat. No. 5,208,593 (“Tong”) in the context of JPEG decoding, since JPEG Huffman tables
Cohen & Pontani, Lieberman & Pavane
Nokia Corporation
Young Brian
LandOfFree
Apparatus and method for decoding Huffman codes using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for decoding Huffman codes using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for decoding Huffman codes using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3045740