Huffman coding for infinite symbol sets

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C341S067000, C341S107000

Reexamination Certificate

active

06622145

ABSTRACT:

FIELD OF THE INVENTION
This present invention relates generally to lossless data compression, and more particularly to code design and code selection for infinite symbol sets.
BACKGROUND OF THE INVENTION
Lossless compression or entropy coding is used in communication and storage of digital information. In the method generally referred to as Huffman coding, see David A. Huffman, “A Method for the Construction of Minimum-Redundancy Codes”,
Proc. of the IRE,
vol. 40(10), September, 1952, doc. D1, the source data is separated into symbols
20
(see FIG.
1
). The entropy coder
22
of
FIG. 1
represents symbols
20
as variable length bit strings
26
(i.e., codewords) looked up in a codebook
24
. Compression relies on assigning short codewords to frequent symbols and reserving longer codewords for infrequent symbols. The performance of a coder depends on its ability to match the probability distribution of symbols in the source with the probability distribution corresponding to the codebook.
A fundamental codebook design problem is how to design a codebook given the probability distribution of the source. Two methods are in general use, one for “finite” codes and the other for “infinite” codes. Finite codes can represent a known number of symbols, each symbol having a known non-zero probability of occurrence. Infinite codes, on the other hand, are infinite in the sense that they are “infinitely expandable”, i.e., a given infinite code can be expanded as far as needed to represent a desired set of input symbols. To be a valid infinite code, no codeword can be a prefix for any other codeword. An “infinite symbol set” is a set of symbols, which can be finite, that is represented by an infinite code.
Finite codebooks can be designed using the Huffman algorithm to generate a codebook matched to the source statistics. This algorithm constructs the set of codewords by starting from the least probable symbol and moving upward in probability. One suboptimal but reduced-complexity finite Huffman coding example is disclosed in U.S. Pat. No. 4,560,976, entitled “Data Compression” and issued Dec. 24, 1985 to Finn. Finn describes a finite codebook consisting only of one-, two-, and three-subword-long codewords. Finn uses a symbol-appearance-counting method to continuously re-rank his symbols and to decide which symbols should be assigned to which of his three lengths of codewords.
Finite-codebook methods such as those above are not well suited to infinite codes. One reason is that symbol appearance counting will not generally be able to adequately estimate the probability of occurrence of a large number of symbols having non-zero but near-zero symbol probabilities.
Several families of infinite codes have been developed based on regular structures or algorithms. Golomb codes are one such family of codes. See S. W. Golomb, “Run length encodings”,
IEEE Trans. Inform. Theory,
vol. 12, pp. 399-401, July, 1966. Geometrically distributed (i.e., exponential) codes are another such family. See R. Gallager and D. C. Van Voorhis, “Optimal source codes for geometrically distributed integer alphabets”,
IEEE Trans. Inform. Theory,
vol. 21, pp. 228-30, March, 1975. Other algorithmic infinite codes exist as well. See, e.g., N. Merhav et al., “Optimal Prefix Codes for Sources with Two-Sided Geometric Distributions”,
IEEE Trans Inform. Theory,
vol. 46, pp. 121-35, January, 2000; N. Merhav et al., “The LOCO-I Lossless Image Compression Algorithm: Principles and Standardization into JPEG-LS”,
IEEE Trans. Image Processing,
vol. 9, No. 8, August, 2000.
The regular-structured algorithmic approach provides a way to generate an infinite number of codewords but gives limited control over the resulting probability distribution. Another publication describes a method to generate an optimal codebook for an infinite distribution through a recursive sequence of calculations. A. Kato et al., “Huffman Coding with an Infinite Alphabet”,
IEEE Trans. Inform. Theory
vol. 42, pp. 977-84, May, 1996. This approach serves more as a theoretical construction since the complexity of encoding and decoding this code can be significant.
An alternative to matching a probability distribution with a fixed code is to use adaptive coding. In adaptive coding, the codeword used to represent a symbol is changed dynamically while the encoder operates. Changes in the encoder are either signaled by introducing overhead, or tracked from the decoded data. Overhead subtracts from the compression efficiency, and tracking the encoder state from decoded data is error prone. In both cases, additional complexity is introduced into the encoding and decoding operations.
Arithmetic coding can also be used to efficiently code a general probability distribution but the additional complexity of an arithmetic coder can be a drawback in some applications.
SUMMARY OF THE INVENTION
The present disclosure addresses the problem of developing an infinite code tailored to a given probability distribution. Using the methods described herein, it is possible to generate a low complexity code that provides good compression performance.
Generally, the described embodiments model a given probability using a combination of two or more existing infinite codes. Codes of low complexity are preferred. For instance, two Golomb codes may be selected, one that matches the source distribution well for symbol indices near zero, and another that matches the source distribution well for large symbol indices. The two codes are “grafted” together such that the first is used for the symbol indices near zero, and the second is used for all other indices. When properly grafted as taught herein, only a few parameters are needed to describe such a distribution.
The disclosure generalizes the idea as explained simply above to accommodate a wide range of existing infinite codes and any desired number of code intervals. The described embodiments pertain generally to constructing a grafted codebook, encoding symbols using a grafted codebook, and decoding codewords formed using a grafted codebook.


REFERENCES:
patent: 5021782 (1991-06-01), Perron et al.
patent: 5414425 (1995-05-01), Whiting et al.
patent: 5532694 (1996-07-01), Mayers et al.
patent: 5883589 (1999-03-01), Takishima et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Huffman coding for infinite symbol sets does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Huffman coding for infinite symbol sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Huffman coding for infinite symbol sets will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3093876

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.