Statistical data compression/decompression method

Image analysis – Image compression or coding – Lossless compression

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C375S240230

Reexamination Certificate

active

06542644

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates to a statistical data compression/decompression method and, more particularly, to a statistical data compression/restoration method for variable-length encoding of a source character using the probability that the source character will appear following a previous character string (context) of n characters or for decoding of such a variable-length code to a character.
The rapid progress that has recently been made in the development of computers has been accompanied by the need to handle large quantities of data within computers. These data are compressed to shorten transmission time or utilize memory more efficiently.
Various coding methods for compressing data are known. One of these methods is coding that can be adapted to a variety of data, i.e., coding that is not limited to data such as character codes, vector information and images. Such coding is referred to as “universal coding.” Example of universal coding are dictionary coding, which utilizes the similarity of character strings, and statistical coding, which utilizes the frequency of appearance of characters. In the discussion that follows, which is based upon information theory, one unit of a word of data shall be referred to as a “character” and a series of any of such word units shall be referred to as a “character string”.
Statistical compression is a method which, in accordance with the statistical frequency of appearance (probability of appearance) of compressed data, obtains compressed data by assigning short code lengths to characters having a high probability of appearance. Typical examples of statistical coding are (1) arithmetic coding (e.g., “Arithmetic Coding for Data Compression”, by Ian H. Witten, et al., Commun. of ACM Vol. 130, No. 6P, 520-540, or “An Adaptive Dependence Source Model for Data Compression Scheme”, by D. M. Abrahamson, Commun. of ACM Vol. 132 No. 1, P77-82), and (2) Huffman coding (e.g., “Dynamic Huffman Coding”, by Donald E. Knuth, Journal of Algorithms, Vol. 6, P163-180).
In Huffman coding, a code (the Huffman code) is used as the code of each character, where the Huffman code has a code length that is inversely proportional to the frequency of appearance of the character. Before the details of Huffman coding are discussed, a code tree, which is the structure of data used when a Huffman code is generated, will be described.
FIG. 18
shows an example of a Huffman tree. The points indicated by the circles and squares are nodes. The segments connecting nodes are referred to as “branches” and the node located at the uppermost position is referred to as a “root”. An underlying mode Y connected to a certain node X is referred to as the “child” of the node X. Conversely, the node X is referred to as the “parent” of the node Y. A node that does not have a “child” is referred to as a “leaf”. Each leaf has a character corresponding to it. Nodes other than “leaves” are referred to as “internal nodes”. The number of branches from the “root” to a node is referred to as the “level” of that node.
When encoding is performed using a code tree, the path from the “root” to the “leaf” correlated with the character to be encoded is outputted as a code. More specifically, “1” is outputted when the path branches to the left of each node from the “root” to the target “leaf”, and “0” is outputted when the path branches to the right of each node from the “root” to the target “leaf.” For example, in the code tree illustrated in
FIG. 18
, the code “00” is outputted for the character A, which has been correlated with the “leaf” of node number
7
, and the code “011” is outputted for the character B, which has been correlated with the “leaf” of node number
8
. At decoding, the character outputted is that correlated with the “leaf” reached when nodes are traced in reverse in accordance with the value of each bit of data to be decoded.
In accordance with Huffman coding, a code tree of the above-mentioned kind is generated through the following procedure (referred to as the “Huffman algorithm”) (see FIG.
19
).
(1) A node (a leaf initially) corresponding to each character is prepared and the frequency of appearance of the corresponding character is recorded in advance for each node.
(2) One new node is created for two nodes having the lowest frequency of appearance, and the node created is connected to the two original nodes by branches. The sum of the frequencies of appearance of the two original nodes connected to the new node by the branches is recorded as the frequency of appearance of this newly created node.
(3) One new node is created for two nodes having the next lowest frequency of appearance, and the node created is connected to the two original nodes by branches. The sum of the frequencies of appearance of the two original nodes connected to the new node by the branches is recorded as the frequency of appearance of this newly created node. Step (3) is continued until there are no longer any nodes having a parent.
In the code tree generated through this procedure, each character is assigned a code having a code length that is inversely proportional to the frequency of appearance of this character. If encoding is performed using this code tree, therefore, the data will be compressed. There are two kinds of Huffman coding: static, in which the Huffman tree is fixed, and adaptive, in which the coding tree is modified in conformity with the frequency of appearance of each character.
When one character is encoded in accordance with the Huffman coding scheme, a code comprising a whole-number of bits is generated. With arithmetic coding, on the other hand, a fractional number of bits can be assigned to one character. In arithmetic coding, an interval [indicated by (0,1) below] of values greater than 0 but less than 1 grows successively narrower in conformity with the probability of occurrence (frequency of appearance) of each character constituting the source data to be encoded. When the processing of all pixels is finished, a numerical value representing one point in the narrowed interval is outputted as the code.
By way of example, assume that characters to be encoded are the five characters a, b, c, d and e, and that the probabilities that these characters will occur are 0.2, 0.1, 0.05, 0.15 and 0.5, respectively. In such case, intervals having interval widths conforming to these probabilities of occurrence are assigned to the respective characters, as shown in FIG.
20
. If a source character string to be encoded is “abe”, then the interval (0,1) is first narrowed to the interval (0,0.2) with regard to the character “a”, as shown schematically in FIG.
21
. Next, the interval (0,0.2) is partitioned into intervals conforming to the probabilities of occurrence of each of the characters, and the interval (0.04,0.06), which corresponds to the next character “b”, is selected as the interval of the character string “ab”. This interval (0.04,0.06) is then partitioned into intervals conforming to the probabilities of occurrence of each of the characters, and the interval (0.05,0.06), which corresponds to the next character “e”, is selected as the interval of the character string “abe”. Finally, a sequence of bits below the decimal point obtained when the position of any point (e.g., the lower limit) in this interval is expressed by a binary number is outputted as the encoded result.
In order to achieve better compression with probabilistic statistical coding, the probability of occurrence of each character is obtained in correlation with a character string (context) that has appeared immediately prior to the source character to be encoded. In this case, as shown in
FIG. 22
, encoding is implemented by an apparatus having a context acquisition unit (context registration unit)
1
and a variable-length encoder
2
. The context acquisition unit
1
stores a character string (context) that has appeared and counts the number of these appearances using a context tree of the kind shown in
FIG. 23
, and obtains a conditional probability of appearance (referred

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Statistical data compression/decompression method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Statistical data compression/decompression method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Statistical data compression/decompression method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3009010

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.