Method and system for improving lossless compression efficiency

Coded data generation or conversion – Digital code to digital code converters – Adaptive coding

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C341S067000

Reexamination Certificate

active

06657565

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to computer-aided data compression, and in particular to a high throughput lossless compression technique.
2. Description of the Related Art
The ever increasing processing capacity of modern data processing systems has resulted in a corresponding need for improved data compression techniques. Data compression methods, most of which exploit the redundancy that is characteristic of data streams, have become an essential element of many high speed data communications and storage systems. Data compression is useful in computer-aided communications because it enables devices to transmit the same amount of data in fewer bits.
In communications, a transmitter can compress data before transmitting it, and a receiver can decompress the data after receiving it, thus increasing the effective data rate of the communications channel. In storage applications, data can be compressed before storage and decompressed upon retrieval, thereby increasing the effective capacity of a storage device. Data compression is widely utilized to reduce required storage capacity in backup utilities, spreadsheet applications, and database management systems. Other applications of data compression include magnetic and optical disk interfaces, satellite communications, computer network communications, interconnections between computers and attached devices, tape drive interfaces, and write-once media, where storage capacity is critical.
Compression techniques can be generally categorized as being either lossless or lossy. Lossless data compression transforms a body of data into a smaller body of data from which it is possible to exactly and uniquely recover the original data. Lossy data compression transforms a body of data into a smaller body of data from which an acceptable approximation of the original—as defined by a selected fidelity criterion—can be reconstructed.
Lossless compression is required for applications in which all information in the original stream must be preserved. Such applications include transmission or storage of textual data, such as a printed language, programming language source code or object code, database information, numerical data, and electronic mail. Lossless compression is also utilized for devices such as disk drive controllers that must provide exact preservation and restoration of the object data.
Among the most common approaches to lossless data compression are textual substitution methods, in which frequently-appearing data strings are replaced by shorter indices or pointers that correspond to the data strings in a dictionary or a recent history of the input data stream. For dictionary-based implementations, an encoder module and decoder module are typically utilized to maintain identical copies of a dictionary containing data strings that have appeared in the input stream. The encoder finds matches between portions of the input stream and the previously-encountered strings stored in the dictionary. The encoder then transmits the dictionary index or pointer corresponding to the string in place of the matched string.
The encoder can also update the dictionary with an entry based on the current match and the current contents of the dictionary. If insufficient space is available in the dictionary, space is created by deleting strings from the dictionary. The decoder, operating in a converse manner, receives at its input a set of indices, retrieves each corresponding dictionary entry as a “current match”, and updates its dictionary. Because the encoder and decoder operate in a “lock-step” fashion to maintain identical dictionaries, no additional communication is necessary between the encoder and decoder. Thus, the input to the encoder is a stream of bytes or characters, and the output is a sequence of pointers. Conversely, the input to the decoder is a stream of pointers and the output is a stream of bytes or characters.
Implementing conventional dictionary-based compression techniques introduces additional system complexity required for handling the data storage requirements and switching among more than one dictionary. In addition, the efficiency of dictionary-based compression techniques suffers in contexts involving variable data structures and segment sizes.
As an alternative approach to dictionary-based lossless compression techniques, a recent history of the input data stream may be utilized to process and point to matching sequences within the history. The IBM adaptive lossless data compression (ALDC) family of products utilizes a derivative of Lempel-Ziv encoding to compress data. As a general compression technique, the Lempel-Ziv algorithm integrates well into systems required to handle many different data types. This algorithm processes a sequence of bytes by keeping a recent history of the bytes processed and pointing to matching sequences within the history. Compression is achieved by replacing matching byte sequences with a copy pointer and length code that together are smaller in size than the replaced byte sequence.
The performance of any given compression method can be evaluated in terms of compression efficiency, throughput, and latency. Factors affecting the performance of compression techniques such as ALDC in terms of compression efficiency and latency include data stream content and the size of the history buffer with respect to the amount of data to be compressed. As previously explained, ALDC and other LZ
1
-based compression algorithms rely upon a history buffer to perform compression by finding redundant strings. When the data being compressed is partitioned into small segments, and each compressed segment is required to be independently decompressible, a history buffer reset is required. Resetting the history buffer significantly reduces the compression ratio of a segmented data stream. As the segmentation of a given data stream increases (resulting in smaller segments) the compression ratio correspondingly decreases.
The size of the history buffer relative to the amount of data to be compressed is a limiting factor on the compression ratio of an ALDC compressor. Unmatched symbols are processed in raw form as literals with no compression technique applied thus reducing compression efficiency.
In addition to the above-mentioned performance affecting factors (i.e., data stream content and the size of the history buffer with respect to the amount of data to be compressed), the performance of ALDC in terms of throughput is ultimately limited by the bandwidth of the compression core device. The bandwidth limitation of conventional ALDC cores is one byte per cycle. This one byte per cycle limitation is set in accordance with the compression character increment, which for reasons of compression efficiency, is set to one character (ASCII, for example) per byte. As implemented within current application specific integrated circuit (ASIC) technology, the operational frequency of the core is limited to the system clock. Thus, regardless of data content and history buffer capacity, the maximum output throughput realizable for a conventional ALDC is dependent on the operational frequency of the system in which it is incorporated.
From the foregoing, it can be appreciated that a need exists for an improved adaptive lossless compression technique that reduces the loss of compression incident to resetting the history buffer, enables compression of character sequences not included within the history buffer, and has a compression processing throughput greater than one byte per cycle of the system clock. The present invention addresses such a need by providing a compression system that would improve lossless data compression throughput while minimally impacting compression efficiency.
BRIEF SUMMARY OF THE INVENTION
A method and system for increasing compression efficiency of a lossless data compression utility are disclosed herein. The data compression utility compresses a segmented input data stream into independently decompressible data blocks, and includes a history buffer that

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for improving lossless compression efficiency does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for improving lossless compression efficiency, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for improving lossless compression efficiency will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3176612

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.