Method and apparatus for data compression using fingerprinting

Coded data generation or conversion – Digital code to digital code converters – Adaptive coding

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C341S067000, C341S053000, C341S050000

Reexamination Certificate

active

06611213

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to data storage and communications systems and, more particularly, to improving the capacity and use of such data storage and communications systems.
BACKGROUND OF THE INVENTION
Conventional data compression techniques and systems encode a stream of digital data into a compressed code stream and decode the compressed code stream back into a corresponding original data stream. The code stream is referred to as “compressed” because the stream typically consists of a smaller number of codes than symbols contained in the original data stream. Such smaller codes can be advantageously stored in a corresponding smaller amount of memory than the original data. Further, the compressed code stream can be transmitted in a communications system, e.g., a wired, wireless, or optical fiber communications system, in a corresponding shorter period of time than the uncompressed original data. The demand for data transmission and storage capacity in today's communications networks, as required by the significant increase in content exchanges across such networks, is ever-increasing. Thus, data compression plays an integral role in most modem transmission protocols and communications networks.
As is well-known, two classes of compression techniques useful in the compression of data are so-called special purpose compression and general purpose compression. Special purpose compression techniques are designed for compressing special types of data and are often relatively inexpensive to implement. For example, well-known special purpose compression techniques include run-length encoding, zero-suppression encoding, null-compression encoding, and pattern substitution. These techniques generally have relatively small compression ratios due to the fact that they compress data which typically possesses common characteristics and redundancies. As will be appreciated, a compression ratio is the measure of the length of the compressed codes relative to the length of the original data. However, special purpose compression techniques tend to be ineffective at compressing data of a more general nature, i.e., data that does not possess a high degree of common characteristics and the like.
In contrast, general purpose compression techniques are not designed for specifically compressing one type of data and are often adapted to different types of data during the actual compression process. Some of the most well-known and useful general purpose compression techniques emanate from a family of algorithms developed by, J. Ziv and A. Lempel, and commonly referred to in the art as “Lempel-Ziv coding”. In particular, Ziv et al., “A Universal Algorithm for Sequential Data Compression”,
IEEE Transactions on Information Theory
, IT-23(3):337-343, May 1977 (describing the commonly denominated “LZ1” algorithm), and Ziv et al., “Compression of Individual Sequences Via Variable-Rate Coding”,
IEEE Transactions on Information Technology
, IT-24(5):530-536, September 1978 (describing the commonly denominated “LZ2” algorithm), which are each hereby incorporated by reference for all purposes. The LZ1 and LZ2 data compression schemes are well-known in the art and need not be discussed in great detail herein.
In brief, the LZ1 (also referred to and known in the art as “LZ77”) data compression process is based on the principle that a repeated sequence of characters can be replaced by a reference to an earlier occurrence of the sequence, i.e., matching sequences. The reference, e.g., a pointer, typically includes an indication of the position of the earlier occurrence, e.g., expressed as a byte offset from the start of the repeated sequence, and the number of characters, i.e., the matched length, that are repeated. Typically, the references are represented as “<offset, length>” pairs in accordance with conventional LZ1 coding. In contrast, LZ2 (also referred to and known in the art as “LZ78”) compression parses a stream of input data characters into coded values based on an adaptively growing look-up table or dictionary that is produced during the compression. That is, LZ2 does not find matches on any byte boundary and with any length as in LZ1 coding, but instead when a dictionary word is matched by a source string, a new word is added to the dictionary which consists of the matched word plus the following source string byte. In accordance with LZ2 coding, matches are coded as pointers or indexes to the words in the dictionary.
As mentioned above, the art is replete with compression schemes derived on the basic principles embodied by the LZ1 and LZ2 algorithms. For example, Terry A. Welch (see, T. A. Welch, “A Technique for High Performance Data Compression”,
IEEE Computer
, pp. 8-19, June 1984, and U.S. Pat. No. 4,558,302, issued to Welch on Dec. 10, 1985, each of which is incorporated by reference for all purposes) later refined the LZ2 coding process to the well-known “Lempel-Ziv-Welch” (“LZW”) compression process. Both the LZ2 and LZW compression techniques are based on the generation and use of a so-called string table that maps strings of input characters into fixed-length codes. More particularly, these compression techniques compress a stream of data characters into a compressed stream of codes by serially searching the character stream and generating codes based on sequences of encountered symbols that match corresponding longest possible strings previously stored in the table, i.e., dictionary. As each match is made and a code symbol is generated, the process also stores a new string entry in the dictionary that comprises the matched sequence in the data stream plus the next character symbol encounter in the data stream.
As will be appreciated and as detailed above, the essence of Lempel-Ziv coding is finding strings and substrings which are repeated in the original data stream, e.g., in a document to be transmitted. The repeated phrases in the document under compression are replaced with a pointer to a place where they have occurred earlier in the original data stream, e.g., document. As such, decoding data, e.g., text, which is compressed in this manner simply requires replacing the pointers with the already decoded text to which it points. As is well-known, one primary design consideration in employing Lempel-Ziv coding is determining whether to set a limit on how far back a pointer can reach, and what that limit should be. A further design consideration of Lempel-Ziv coding involves which substrings within the desired limit may be a target of a pointer. That is, the reach of a pointer into earlier text may be unrestricted, i.e., a so-called growing window, or may be restricted to a fixed size window of the previous “N” characters, where N is typically in the range of several thousand characters, e.g., 3 kilobytes. In accordance with this coding repetitions of strings are discovered and compressed only if they both appear in the window. As will be appreciated, the considerations made regarding such Lempel-Ziv coding design choices represent a compromise between speed, memory requirements, and compression ratio. Sliding windows do, however, present at least one drawback: sliding windows do not find strings that occur far apart, e.g., 10,000 character, in the input text.
While prior art compression schemes, such as the aforementioned LZ1, LZ2, and LZW compression methods, provide effective data compression, efforts continue to search for even greater compression to reduce storage requirements and transmission times.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for achieving relatively low compression ratios based on our realization of using a longer history and longer common strings of the input data stream as an initial evaluation of the input data prior to applying a compression process, e.g., any Lempel-Ziv compression methodology. That is, given that typical compression processes utilize a relatively short, i.e., the most current few kilobytes, history of the input data to effect the desired compression, we hav

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for data compression using fingerprinting does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for data compression using fingerprinting, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for data compression using fingerprinting will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3079683

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.