Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-09-08
2001-06-12
Alam, Hosain T. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06247015
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to a method and system for data processing in general, and in particular to a method and system for compressing files. Still more particularly, the present invention relates to a method and system for compressing files utilizing a dictionary array.
2. Description of the Prior Art
Dictionary-based compression algorithms that are derived from the classical Lempel-Ziv scheme find their applications in various compression software, ranging from the age-old UNIX “compress” utility to the more recent gzip, pkzip, and winzip compression programs. In a dictionary-based compression algorithm, an input file is typically examined sequentially on a byte by byte basis for unique patterns. Each unique pattern found in the input file is then stored in a dictionary in association with a compact label. During the course of the examination of the input file, when a repeat pattern is found, the repeat pattern will be replaced by its corresponding compact label. Accordingly, the more repeat patterns that are found in the input file, the higher the compression ratio will be. The sequence of data, including compact labels, is subsequently stored along with the dictionary as a compressed file. To decompress the file, each compact label in the compressed file is located in the dictionary, and the corresponding pattern is inserted into the file in lieu of the compact label, thus restoring the compressed file back to its original form.
Because the dictionary is stored along with the data as part of the compressed file in the dictionary-based approach, the size of the dictionary is the limiting factor for achieving a good compression ratio. In fact, the dictionary size is such a dominating factor to the compression efficiency that a compression ratio of better than 50% is generally unachievable with randomly chosen binary files. Another drawback associated with the dictionary-based approach is that the overhead for writing and reading the dictionary to perform the compression and decompression, respectively, is relatively large.
With the ever increasing demand in data throughput rates in today's information age, data compactness is very important. For example, in applications such as palm-top computers or personal digital assistants where the space for data storage is rather limited, it is very crucial to have a relatively high data compression ratio for data that can be stored in a compressed form. As a result, any improvement in compression ratio to yield a more compact form of data for storage and transmission can result in a considerable competitive advantage. Consequently, it would be desirable to provide an improved dictionary-based method for compressing files within a data-processing system.
SUMMARY OF THE INVENTION
In accordance with a method of the present invention, a binary file commonly available to a data-compressing system during compression and to a data-decompressing system during decompression can serve as a dictionary file. A first dictionary array is initially generated utilizing the dictionary file. Each entry within the first dictionary array includes a set of unique bit patterns from the dictionary file. An input file is parsed into multiple blocks, with each block having the same length as each entry within the first dictionary array. The input file is then compared against the first dictionary array, and each entry within the first dictionary array that includes the same bit patterns as a block from the input file is marked accordingly. A second dictionary array that includes all the marked entries from the first dictionary array is subsequently generated, and this second dictionary array is utilized in the compression of the input file. During compression, any block from the input file having a bit pattern found in the second dictionary array will be replaced by a corresponding index to the second dictionary array.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 5473326 (1995-12-01), Harrington et al.
patent: 5530645 (1996-06-01), Chu
patent: 5729228 (1998-03-01), Franaszek et al.
patent: 5815096 (1998-09-01), Smith
patent: 5838963 (1998-11-01), Griffiths
patent: 5841376 (1998-11-01), Hayashi
patent: 5956724 (1999-09-01), Griffiths
patent: 5974179 (1999-10-01), Caklovie
patent: 6075470 (2000-06-01), Little et al.
Baumgartner Jason Raymond
Malik Nadeem
Roberts Steven Leonard
Alam Hosain T.
Bracewell & Patterson L.L.P.
International Business Machines - Corporation
Shah Sanjiv
VanLeeuwen Leslie A.
LandOfFree
Method and system for compressing files utilizing a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for compressing files utilizing a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for compressing files utilizing a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2509999