Coded data generation or conversion – Digital code to digital code converters – Adaptive coding
Reexamination Certificate
2001-05-21
2004-06-29
Tokar, Michael (Department: 2819)
Coded data generation or conversion
Digital code to digital code converters
Adaptive coding
C341S063000, C341S064000, C341S076000, C341S077000, C341S087000, C341S106000
Reexamination Certificate
active
06756922
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to data compression of mostly similar strings.
BACKGROUND OF THE INVENTION
To store sets of mostly similar strings supporting fast retrieval of randomly selected members, one method is to represent each string with its difference from a set-fixed reference string (which is usually also a member of the set). This representation will result in compression when the string is sufficiently similar to the reference. Thus, choosing a good reference string is central to the quality of compression when such a storage method is used. Given a set of N strings, selecting the best reference string among them requires an order of N
2 
compression tests. Such a selection technique, especially in large sets of strings, is lengthy.
DEFINTIONS AND OBJECTIVES
For any compression method, there are a few parameters that may be defined:
CompLength(S
c
, S
r
)—if S
r 
is the reference string and S
c 
a string to be compressed, then CompLength(S
c
, S
r
) is the length of the compressed representation of S
c 
with respect to S
r
.
TotalLength(S
r
)—is the total length of the compressed representation of all the strings in the set, when they are compressed using S
r 
as the reference string.
SUMMARY OF THE INVENTION
The object of the invention is to easily find such a string, S
r
, so TotalLength (S
r
) is minimal.
This object is realized in accordance with a broad aspect of the invention by a computer implemented method for selecting a string for serving as a reference string for a comparison scheme for compressing a set of strings, the method comprising the following steps:
(a) calculating preliminary compression results for every string relative to an initial reference string, and
(b) using the preliminary compression results to find a better reference string without additional compression tests.
According to one embodiment of the invention, there is provided a computer implemented method for selecting a string for serving as a reference string for a comparison scheme for compressing a set of strings, the method comprising the following steps:
(a) compressing the set of strings against a selected initial reference string so as to produce a set of compressed strings,
(b) determining a histogram of the costs of all strings in the set of compressed strings showing for each different length of string in the set of compressed strings a frequency of occurrence in the set, and an identity of at least one string whose compression length equals said different length, and
(c) using said histogram to determine a better reference string.
The invention is based on the heuristic assumption that:
if CompLength (S
1
, S
r
)−CompLength (S
2
, S
r
)=&dgr;
Then CompLength (S
1
, S
2
)≈&dgr;
In other words, a subset of strings that are different from a reference string by similar degrees will probably be compressed at a lower cost if one of them is chosen as the reference string instead. It is therefore possible to predict the results of compression with one reference string based on the actual result of compressing it with another.
The invention uses the above heuristic assumption in order to predict a good reference string or strings. There are several ways of utilizing this idea, but all are based on the same principle of calculating preliminary compression results for every string relative to an arbitrary chosen string, and then using these results to find, with very small computational cost and without additional compression tests, a better reference string.
REFERENCES:
patent: 4516246 (1985-05-01), Kenemuth
patent: 4748638 (1988-05-01), Friedman et al.
patent: 4862167 (1989-08-01), Copeland, III
patent: 5357250 (1994-10-01), Healey et al.
patent: 5406282 (1995-04-01), Nomizu
patent: 5488365 (1996-01-01), Seroussi et al.
patent: 5635932 (1997-06-01), Shinagawa et al.
patent: 5748122 (1998-05-01), Shinagawa et al.
patent: 6061007 (2000-05-01), Eastty et al.
patent: 6492920 (2002-12-01), Oki et al.
Browdy and Neimark , P.L.L.C.
International Business Machines - Corporation
Nguyen Khai M.
Tokar Michael
LandOfFree
Method and system for compression of a set of mostly similar... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for compression of a set of mostly similar..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for compression of a set of mostly similar... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3349103