Coded data generation or conversion – Digital code to digital code converters – Adaptive coding
Patent
1997-08-15
1999-11-23
Young, Brian
Coded data generation or conversion
Digital code to digital code converters
Adaptive coding
341 67, H03M 700
Patent
active
059908106
DESCRIPTION:
BRIEF SUMMARY
INTRODUCTION
The present invention provides a method and apparatus for partitioning one or more blocks of data into subblocks for the purpose of communicating and storing such subblocks in an efficient manner.
BACKGROUND
Much of the voluminous amount of information stored, communicated, and manipulated by modern computer systems is duplicated within the same or a related computer system. It is commonplace, for example, for computers to store many slightly differing versions of the same document. It is also commonplace for data transmitted during a backup operation to be almost identical to the data transmitted during the previous backup operation. Computer networks also must repeatedly carry the same or similar data in accordance the requirements of their users.
Despite the obvious benefits that would flow from a reduction in the redundancy of communicated and stored data, few computer systems perform any such optimization. Some instances can be found at the application level, one example being the class of incremental backup utilities that save only those files that have changed since the most recent backup. However, even these utilities do not attempt to exploit the significant similarities between old and new versions of files, and between files sharing other close semantic ties. This kind of redundancy can be approached only by analysing the contents of the files.
The present invention addresses the potential for reducing redundancy by providing an efficient method for identifying identical portions of data within a group of blocks of data, and for using this identification to increase the efficiency of systems that store and communicate data.
SUMMARY OF THE INVENTION
To identify identical portions of data within a group of blocks of data, the blocks must be analysed. One simple approach is to divide the blocks into fixed-length (e.g. 512-byte) subblocks and compare these with each other so as to identify all identical subblocks. This knowledge can the be used to manage the blocks in more efficient ways.
Unfortunately, the partitioning of blocks into fixed-length subblocks does not always provide a suitable framework for the recognition of duplicated portions of data, as identical portions of data can occur in different sizes and places within a group of blocks of data. FIG. 1 shows how division into fixed-size subblocks of two blocks (whose only difference is the insertion of a single byte (`X`)) fails to generate identical subblocks. A comparison of the two groups of subblocks would reveal no identical pairs of subblocks even thought the two original blocks differ by just one character.
A better approach is to partition each block using the data in the block itself to determine the position of the partitions.
In an aspect of the invention, the blocks are partitioned at boundaries determined by the content of the data itself. For example, a block could be partitioned at each point at which the preceding three bytes has to a particular constant value. FIG. 2 shows how such a data dependent partitioning could turn out, and contrasts it with a fixed-length partitioning. In FIG. 3 data independent partitioning generates seven distinct subblocks whereas the data-dependent partitioning generates just four, allowing much of the similarity between the two blocks to be detected.
The fact that a partitioning is data dependent does not imply that it must incorporate any knowledge of the syntax or semantics of the data. So long as the boundaries are positioned in a manner dependent on the local data content, identical subblocks are likely to be formed from identical portions of data, even if the two portions are not identically aligned relative to the start of their enclosing blocks (FIG. 3).
Once the group of blocks has blocks has been partitioned into subblocks, the resulting group of subblocks can be manipulated in a manner that exploits the occurrence of duplicate subblocks. This leads to a variety of applications, some of which are described below. However, the application of a further aspect of the i
REFERENCES:
patent: 4698628 (1987-10-01), Herkert et al.
patent: 5235623 (1993-08-01), Sugiyama et al.
patent: 5479654 (1995-12-01), Squibb
Williams, Ross, "An algorithm for matching text (possibly original)", Newsgroup posting, comp.compression, Jan. 27, 1992.
Williams, Ross, "Parallel data compression", Newsgroup posting, comp.conmpression.research, Jun. 30, 1992.
Knuth, Donald E., "The Art of Computer Programming, vol. 3: Sorting and Searching", pp. 508-513, Addison-Wesley Publishing Company, 1973.
Williams, Ross N., "An Introduction to Digest Algorithms" Proceedings of the Digital Equipment Computer Users Society, pp. 9-18, Aug. 1994.
Williams, Ross N., "An Extremely Fast ZIV-Lempel Data Compression Algorithm", Proceedings of Data Compression Conference, pp. 362-371, Apr. 1991.
Knuth, Donald E., The Art of Computer Programming, vol. 1: Fundamental Algorithms, pp. 435-451, Addison Wesley Publishing Company, 1973.
Bell Robert P.
Traurig Greenberg
Young Brian
LandOfFree
Method for partitioning a block of data into subblocks and for s does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for partitioning a block of data into subblocks and for s, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for partitioning a block of data into subblocks and for s will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1226595