Coded data generation or conversion – Digital code to digital code converters – Adaptive coding
Reexamination Certificate
2001-01-22
2002-05-21
JeanPierre, Peguy (Department: 2819)
Coded data generation or conversion
Digital code to digital code converters
Adaptive coding
C707S793000
Reexamination Certificate
active
06392567
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an apparatus for compressing data strings to be compressed and a method thereof using a dictionary made by the data strings. Data to be compressed is handled by a unit. The unit may be bits, characters, words, pixels, gray levels, or some other appropriate unit. The present invention can be applied to the compression of various types of data and is not limited to the compression of a character code. In the following, however, we call the unit as a character based on information theory.
2. Description of the Related Art
The data amount being handled has been increasing as various types of data such as a character code, image data, etc., is being handled by computer in recent years. When such a great amount of data is handled, the required storage capacity can be decreased or the data can be transmitted to a remote place at high speed by omitting a redundant part of the data and compressing the remaining data.
As a conventional data compression technology, two technologies are known such as a dictionary model encoding technology using the repetition of a data string and a probability statistical model encoding technology using the appearance frequency of a data string. As the representative method of the dictionary model, LZ77 encoding and LZ78 encoding technologies are well known (“Document data compression algorithm introduction”, by Tomohiko Uematsu, CQ publication, pp. 131-208, 1995). Since the LZ77 encoding technology can obtain a sufficient compression ratio with an easy process compared with the LZ78 encoding technology, the LZ77 encoding technology has become a mainstream when these technologies are actually used.
As shown in
FIG. 1A
, a slide buffer
1
with a regular size is provided in the LZ77 encoding technology. The buffer
1
retrieves a character string that matches an input character string in the longest length in the buffer
1
, and encodes the input character string using the location and length. This encoding method is called as slide dictionary law, since the buffer
1
is sliding, as the encoding process proceeds.
When an input character string “abcdaaaq . . . ” right next to the buffer
1
is encoded in
FIG. 1A
, “abcd” is the longest matching character string among the character strings matched in the buffer
1
. Thereupon, a relative address “5 (bytes)” of the nearest location of the longest matching character string and the input character string (the matching location A against the input location B=the fifth character to the left of the input location B) is set as a matching location. Then, a code such as (matching location, matching length)=(5,4) is generated while setting a length “4 (bytes)” of the longest matching character string as a matching length. In this way, “abcd” of the first part of the input character string is replaced with (5,4). Similarly, the next character string “aaa” is replaced with a code (13,3).
However, a slide buffer that is actually used is much longer, and when character strings inside the buffer are sequentially retrieved, a long time is required in order to detect the longest matching character string. In fact, all the character strings inside the buffer are not compared with the input character string, but the appearance locations of the prefixes (about 2 to 4 characters) of character strings are registered in a table. Then, the input string is compared with a character string positioned at the location of the table where the character string is stored. A Look-Up Table (LUT) and a Hash Table are used for such a retrieval process.
FIG. 1B
shows a character string retrieval process using the LUT. LUT
2
of
FIG. 1B
sets the prefix of a character string inside the buffer
1
as an address, and stores the location (address or pointer) of the character string in the buffer
1
. At the time of the retrieval process, the area of the LUT
2
is accessed by setting the prefix of an input character string as an address, and the location of the corresponding character string is obtained.
In the case that a plurality of character strings with the same prefixes exist in the buffer
1
, a plurality of appearance locations are stored in the form of a linked list
3
. Therefore, locations of all the corresponding character strings stored in the buffer
1
can be obtained only by accessing the LUT
2
one time. Here, the prefix of two characters is used, and the area of the LUT
2
corresponding to the prefix “ab” of the input character string stores two appearance locations, utilizing the linked list
3
.
Thus, since the LUT corresponds to one character string to be retrieved, to one area of the table, and can obtain required information only by looking up a table one time, the LUT can perform a retrieval process at high speed. However, in the case of the retrieval process of a long character string, the number of the areas required for the table increases with n-th power of character strings which appear. Accordingly, the area required becomes large. If the number of characters which appear is 2
8
=256, 256
n
areas are required for the prefixes of n characters.
When the character string to be retrieved becomes long, however, an actually used (registered) area remains only part of the prepared areas, and the inside of the table becomes thinned. Therefore, in the case of the retrieval process of a long character, the memory efficiency deteriorates.
Thereupon, a character string to be retrieved is degenerated in the hash table, so that a plurality of character strings are made to share one area. Therefore, after the table is looked up, it should be checked whether the thus-obtained character string is actually retrieved. In comparison with the LUT, the hash table can retrieve a longer character string using the same table area as that of the LUT.
FIG. 1C
shows a character string retrieval process using the hash table. A hash code generation unit
4
of
FIG. 1C
generates a hash code
5
from a prefix “abc” of the input character string, and it accesses a hash table
6
while setting the code as an address. In the hash table
6
, the location inside the buffer
1
corresponding to the hash code
5
is stored in the hash table
6
. By comparing a character string “abcde” at the location with the input character string, it is checked whether the prefixes of the two strings match each other. When the two strings match each other, it is judged that a character string in accord with the input character string exists in the buffer
1
.
In the case of the hash table, too, a plurality of appearance locations are stored in the form of a linked list for a plurality of character strings with the same prefix in the buffer
1
. In either case, the liked list is used to retrieve a longest matching character string.
However, there is the following problem in the above-mentioned conventional data compression technology.
In the case that a long matching character string is retrieved by using the LUT, the inside of the table becomes thinned as mentioned above, since only part of the table is used even if a table with a large area is prepared. In a hash table, the table size becomes smaller relative to that of the LUT. However, the inside of the table becomes thinned, if there is little input data. Therefore, as for the hash table, there is the problem that a memory is not used effectively.
Further, there is another problem that since a plurality of appearance locations stored in a linked list should be traced one by one in the case of a retrieval process of the longest matching character string, a longer retrieval time is required as the number of character strings with the same prefix increases.
SUMMARY OF THE INVENTION
The object of the present invention is to offer a data compression apparatus and a string retrieval apparatus with a reasonable memory capacity corresponding to an input data amount, thereby effectively performing the longest match retrieval process in a data compression process based on the dictionary model encoding technology.
Fijitsu Limited
Jean-Pierre Peguy
Staas & Halsey
LandOfFree
Apparatus for repeatedly compressing a data string and a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus for repeatedly compressing a data string and a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus for repeatedly compressing a data string and a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2847553