Image analysis – Image compression or coding – Lossless compression
Reexamination Certificate
1999-07-30
2003-12-23
Mehta, Bhavesh M. (Department: 2625)
Image analysis
Image compression or coding
Lossless compression
C382S246000, C382S248000, C382S245000
Reexamination Certificate
active
06668092
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to data compression, and more specifically to lossless data compression using variable length coding.
2. Relevant Background
Data compression is playing an increasingly important role in technology as more and more data is collected, stored, and communicated in the modern information age. High definition televisions, cellular phones, and compact disk players are just a few examples of everyday products which use data compression to represent data.
Data compression is the process of reducing a large amount of information into a smaller-sized representation of the information. The process of compressing the information can be either lossy or lossless. Lossy compression, sometimes called irreversible compression, means that some of the original information is lost during the compression process and thus the original data cannot be perfectly restored from the compressed data representation. Lossy compression is typically used to compress images and sounds where a small amount of data loss is generally acceptable to people viewing or listening to the restored information. In a lossless compression mechanism no data is lost during compression. Lossless compression is reversible since the original information can be perfectly reconstructed from the compressed data representation. Lossless compression is mandatory for many types of data and program code information, and is well suited for text and text formatting compression, as well as images and sounds.
Typically, digital data is stored in fixed length units of bits such as 8 bits (byte). One common method of lossless compression called variable length coding involves representing fixed length data with variable length codewords. If shorter length codewords are selected for the most frequently occurring data and longer length codewords are used to represent infrequent data, the average number of bits used is typically reduced. This technique, for example, is similar to Morse code where frequently occurring letters of the alphabet are represented by short codewords (an “E” is “·”) and lesser used letters are assigned longer codewords (an “X” is “-··-”). To restore the original message from the compressed message, the codewords are simply matched to the original letters using a lookup table. In a similar fashion, fixed length binary data can be compressed using variable length binary codewords to represent data. To restore the original binary message, the binary codewords are matched to the original binary data using a lookup table.
In order for variable length coding compression to work, the code must be uniquely decodable such that the original message can be decoded in one and only one way. Consider, for example, a code mapping of {x,y,z}={1,11,0}. This code is not uniquely decodable because it is impossible to determine from a compressed message containing two sequential one's whether the original message has two x's or a y. A uniquely decodable code is said to be a prefix-free code if every codeword can be decoded without having to read beyond the immediate codeword being examined. Thus, the binary code {x,y,z}={0,10,110} is a prefix-free code since reading a “0” in the present codeword indicates a codeword ending. On the other hand, the binary code {x,y,z}={0,01,011} is not a prefix-free code since a “0” could mean the message contains x, y, or z, and reading the next codeword is necessary to decode the present codeword.
A well known procedure for generating prefix-free codes is called the Huffman coding procedure. Huffman codes are typically generated using a message tree structure dependent on the probability distribution of each letter in the message (frequency of occurrence of the letter divided by the message length). By way of the message tree structure, letters having the highest probability distribution are assigned the shortest codes. Table 1 shows a sample Huffman code mechanism for a set of letters with a hypothetical probability distribution in a message.
TABLE 1
Sample Huffman codewords
Probability
Letter
Distribution
Codeword
A
0.2
0010
B
0.4
01
C
0.4
000
D
0.8
1
E
0.2
0011
Although Huffman coding is an effective technique for achieving high levels of data compression, it has the disadvantage of generally requiring a large lookup table to encode and decode data. A lookup table is needed to reverse the compression process and restore the original data from the coded data. Thus, the lookup table must typically be stored alongside the compressed data, decreasing the effective compression factor. For example, to represent about 128 codewords, one may need codes that are as much as 16 bits in length. This will require a table of size 64K entries. The table size may be reduced by hashing or exploiting the properties of the particular Huffman code used, but such a reduction increases compute time and severely limits the code's adaptability. A large lookup table can also make Huffman coding prohibitive in many embedded systems applications, where the memory available is relatively small. Furthermore, when the characteristics of the data change, the code's optimality is lost and hence a new table may be needed, decreasing the compression efficiency and possibly compute performance.
To avoid storing large lookup tables, Golomb coding techniques have been developed. Golomb codes can be thought of as a special set of variable length prefix-free codewords optimized for non-negative numbers having an exponentially decaying geometric probability distribution. The codewords are constructed such that they can be directly decodable without the need of a lookup table.
Golomb codes are composed of two parts: an integer portion of n/m represented using a unary code, and a n modulo m portion represented using a binary code, where n is a non-negative integer within the original source data and m is a coding factor based on the probability distribution of the data. The bit length of the binary code (n mod m) can be either └log
2
m┘ or ┌log
2
m┐, where └x┘ denotes a floor function returning the greatest integer less than or equal to x, and ┌x┐ denotes a ceiling function returning the least integer greater than or equal to x. The following conditions determine the bit length for the binary code:
└log
2
m
┘ bits if
n
<2
┌log
2
m┐
−m
, and
└log
2
m
┘ bits otherwise.
Table 2 shows Golomb codewords for several values of parameter m where the unary code (integer(n/m)) is represented using zero runs followed by a one, and an inverse binary code is used to represent n mod m.
TABLE 2
Golomb codewords for various m in values
Golomb-Rice codes
m=1
m=2
m=3
m=4
m=5
m=6
n=
k=0
k=1
k=2
0
1
11
11
111
111
111
1
01
10
101
110
110
110
2
001
011
100
101
101
1011
3
0001
010
011
100
1001
1010
4
00001
0011
0101
0111
1000
1001
5
000001
0010
0100
0110
0111
1000
6
. . .
00011
0011
0101
0110
0111
7
. . .
00010
00101
0100
0101
0110
8
. . .
000011
00100
00111
01001
01011
To make the binary implementation simple, m is restricted to be a k-th power of 2 such that m=2
k
. This subset of the Golomb codes, commonly referred to as Rice codes, leads to very simple encoding and decoding procedures. The code for n is constructed by appending the k-th least significant bits of n (i.e. n mod m) to the unary representation of the number formed by the remaining higher order bits of n (i.e. integer(n/m)). Thus the binary portion of the Rice code has a fixed bit-length of k bits, and the total bit-length of any Rice codeword n is given by,
bitLength
⁢
⁢
(
n
)
=
⌊
n
2
k
⌋
+
1
+
k
.
Generally, the optimal average codeword length for an input message, known as the message entropy, is calculated as the sum of each letter's probability distribution multiplied by its self-information. That is,
H
=
Sriram Parthasarathy
Sudharsanan Subramania
Gunnison McKay & Hodgson, L.L.P.
Kassa Yosef
McKay Philip J.
Sun Microsystems Inc.
LandOfFree
Memory efficient variable-length encoding/decoding system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Memory efficient variable-length encoding/decoding system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Memory efficient variable-length encoding/decoding system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3099532