Method and apparatus for compressing data string

Image analysis – Image compression or coding – Lossless compression

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S229000, C341S051000

Reexamination Certificate

active

06563956

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to a data compression method and a data compression apparatus, which are used for compressing data by omitting redundant parts contained in various kinds of data, such as character codes and image data and, more particularly, to a data compression method and a data compression apparatus which employ a dictionary coding scheme utilizing similarity of data sequences.
2. Description of the Related Art
Various kinds of data, such as character codes and image data, are being handled by information processors, such as computers. Consequently, the quantity of data handled by information processors is increasing. Such a large quantity of data usually contains redundant data strings. Storage capacity for storing data can be reduced in an information processor by performing compression processing for omitting such redundant parts. Further, data transmission capacity can be reduced in the information processor by using compressed data. Thus, a data transmission time can be shortened.
The LZ77 compression method and the LZ78 compression method have been known as typical data compression methods using the dictionary coding scheme. The LZ77 compression method can obtain a sufficient compression ratio by performing a simpler process, as compared with the LZ78. compression method. Consequently, the LZ77 compression method is mainly employed for practical use. Therefore, the LZ77 compression method is described hereinbelow.
Incidentally, the present invention can be applied not only to compression of character codes but to that of various kinds of data. However, hereunder, each of data represented in word units will be referred to as a “character” (of an alphabet), and data of an arbitrary number of consecutive words will be referred to as a “character string or sequence”, based on information theory.
According to the LZ77 compression method, as illustrated in
FIG. 1
, a sliding or search buffer
1
of predetermined capacity (incidentally, 16 characters in this example illustrated in this figure) is provided. A character string “defabqaaaacabcde”
2
a
having already been coded and compressed is stored in this buffer. Subsequently, an encoder searches the sliding buffer for a character string “abcd”
2
c
which is the maximum or longest match between the stored character string
2
a
and an input character string
2
b
“abcdaaaq . . . ” to be encoded. The relative address lof the position of the character string
2
c
found as the longest match is 5 (incidentally, this indicates that the character string
2
c
starts 5 characters back from the start of the input character string
2
b
). Further, the match length, namely, the length of the found character string is 4. Then, the relative address and the match length are encoded. Moreover, in the input character string
2
b,
the character string “abcd”
2
c,
which is the last longest match, is replaced with a codeword or token (
5
,
4
) and thus compressed.
Subsequently, the sliding buffer
1
is shifted four characters to the right. The character string
2
a
set in the sliding buffer is now “bqaaaacabcdeabcd”. Then, the encoder searches the next input character string “aaaq . . . ”
2
b
for a match, similarly as in the aforementioned case. Consequently, the current character string “aaa”
2
c
is found as the longest match. Thus, the occurrence or match position of this character sequence “aaa” in the buffer
1
is 13 (incidentally, this indicates that the current character string “aaa”
2
c
starts 13 characters back from the first character “a” of the character sequence “aaa” in the input character string
2
b
). Further, the match length of this longest match “aaa” is 3. Then, such an occurrence position and the match length of this character string “aaa” is encoded in the form of a codeword (
13
,
3
). Moreover, this character string “aaa” is replaced with this codeword (
13
,
3
).
According to LZ77 compression method, as the coding of the input character string proceeds, the encoder shifts the sliding buffer in this way. Therefore, LZ77 compression method is also referred to as a sliding dictionary method.
If the capacity of the sliding buffer used in such an LZ77 compression method is increased, the length of a character string found as the longest match increases. Consequently, the compression ratio is enhanced. However, as a result of the increase in the capacity of the sliding buffer, the encoder should search for an enormous number of combinations of character strings. Thus, in the case of sequentially searching the sliding buffer, the search requires a great deal of effort and time. Therefore, the LZ77 compression method is performed by actually adopting the following process. Namely, a character string (namely, a prefix) consisting of first two to four characters of an input character string and the occurrence position of the prefix are entered into a table as occasion demands, and then the prefix of the input character string is collated with the character list entered into the table, instead of collating all kinds of character strings of the sliding buffer with the input character string. The time required for such search is significantly reduced by employing this process.
The tables used for such a search are a look-up table and a hash table. A method of using a look-up table is to make a character string
2
d
to be searched for have a one-to-one correspondence to an address in a look-up table
3
, as illustrated in FIG.
2
. The past occurrence position (namely, the relative address) of the character string is stored at a corresponding address in the look-up table
3
. Thus, according to this method, the past occurrence position of the character string “ab”
2
d
is known by looking up the character string “ab”
2
d
in the table
3
once to search for this string. Therefore, this method has an advantage in that the search is achieved at a very high speed.
However, in the case that the character string to be searched for is long, the number of combinations of character strings is raised to a higher power. Thus, the look-up table should have an enormous number of addresses. Therefore, this method has a drawback in that a very large amount of memory is needed so as to allocate such an enormous addresses to the look-up table. For example, in the case that the number of characters is 1 (incidentally, 1 character consists of 8 bits), 2
8*1
(=256 bits) of memory are needed. Further, in the case that the number of characters is 2, 2
8*2
(=64 kbits) of memory are needed. Moreover, in the case that the number of characters is 3, 2
8*3
(=16 Mbits) of memory are needed. Therefore, the actual limit to the number of characters is 2. Additionally, this method has another drawback in that, when a character string to be searched for is long, only a small part of the look-up table is actually used (namely, only a small part of memory area assigned to the look-up table is used for entering the past occurrence positions of the character string into this table) and that thus, the look-up table is in a sparse state, and the efficiency of use of the look-up table is low.
In the case of a method of searching for a character string by using a hash table, as illustrated in
FIG. 3
, masking processing is performed on a codeword, which corresponds to a character string to be searched for, in such a way as to decrease the number of bits of the codeword (namely, the degeneration of the codeword is performed). Thus, a hash code
6
is generated (see
4
) so that a plurality of character strings having a common degenerated state share an area of the hash table
5
. Thus, this method features that, as compared with the method of searching for a character string using a look-up table, a longer character string can be searched for, when a search area, in which the character string is searched for, is equivalent to that used in the method using the look-up table.
However, in the case of the method using the hash table obtained in this way

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for compressing data string does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for compressing data string, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for compressing data string will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3087545

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.