System and method for unorchestrated determination of data...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06810398

ABSTRACT:

COPYRIGHT NOTICE/PERMISSION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document of the patent disclosure as it appears in the United States Patent and Trademark Office patent file or records, but otherwise, reserves all copyright rights whatsoever. The following notice applies to the software and data and described below, inclusive of the drawing figures where applicable: Copyright © 2000, Undoo Technologies.
BACKGROUND OF THE INVENTION
The present invention relates, in general, to the field of systems and methods for the unorchestrated determination of data sequences using “sticky byte” factoring to determine breakpoints in digital sequences. More particularly, the present invention relates to an efficient and effective method of dividing a data set into pieces that generally yields near optimal commonality.
Modern computer systems hold vast quantities of data—on the order of a billion, billion bytes in aggregate. Incredibly, this volume tends to quadruple each year and even the most impressive advances in computer mass storage architectures cannot keep pace.
The data maintained in most computer mass storage systems has been observed to have the following interesting characteristics: 1) it is almost never random and is, in fact, highly redundant; 2) the number of unique sequences in this data sums to a very small fraction of the storage space it actually occupies; 3) a considerable amount of effort is required in attempting to manage this volume of data, with much of that being involved in the identification and removal of redundancies (i.e. duplicate files, old versions of files, purging logs, archiving etc.); and 4) large amounts of capital resources are dedicated to making unnecessary copies, saving those copies to local media and the like.
A system that factored redundant copies would reduce the number of storage volumes otherwise needed by orders of magnitude. However, a system that factors large volumes of data into their common sequences must employ a method by which to determine those sequences. Conventional methods that attempt to compare one data sequence to another typically suffer from extreme computational complexity and these methods can, therefore, only be employed to factor relatively small data sets. Factoring larger data sets is generally only done using simplistic methods such as using arbitrary fixed sizes. These methods factor poorly under many circumstances and the efficient factoring of large data sets has long been a persistent and heretofore intractable problem in the field of computer science.
SUMMARY OF THE INVENTION
Disclosed herein is a system and method for unorchestrated determination of data sequences using “sticky byte” factoring to determine breakpoints in digital sequences such that common sequences can be identified. Sticky byte factoring provides an efficient method of dividing a data set into pieces that generally yields near optimal commonality. As disclosed herein, this may be effectuated by employing a hash function with periodic reset of the hash value or, in a preferred embodiment, a rolling hashsum. Further, in the particular exemplary embodiment disclosed herein, a threshold function is utilized to deterministically set divisions in a digital or numeric sequence, such as a sequence of data. Both the rolling hash and the threshold function are designed to require minimal computation. This low overhead makes it possible to rapidly partition a data sequence for presentation to a factoring engine or other applications that prefer subsequent synchronization across the entire data set.
Among the significant advantages of the system and method disclosed herein is that its calculation requires neither communication nor comparisons (like conventional factoring systems) to perform well. This is particularly true in a distributed environment where, while conventional systems require communication to compare one sequence to another, the system and method of the present invention can be performed in isolation using only the sequence being then considered.
In operation, the system and method of the present invention provides a fully automated means for dividing a sequence of numbers (e.g. bytes in a file) such that common elements may be found on multiple related and unrelated computer systems without the need for communication between the computers and without regard to the data content of the files. Broadly, what is disclosed herein is a system and method for a data processing system which includes a fully automated means to partition a sequence of numeric elements (i.e. a sequence of bytes) so that common sequences may be found without the need for searching, comparing, communicating or coordinating with other processing elements in the operation of finding those sequences. The system and method of the present invention produces “sticky byte” points that partition numeric sequences with a distribution that produces subsequences of the type and size desired to optimize commonality between partitions.


REFERENCES:
patent: 3668647 (1972-06-01), Evangelisti et al.
patent: 4215402 (1980-07-01), Mitchell et al.
patent: 4404676 (1983-09-01), DeBenedictis
patent: 4649479 (1987-03-01), Advani et al.
patent: 4761785 (1988-08-01), Clark et al.
patent: 4887204 (1989-12-01), Johnson et al.
patent: 4887235 (1989-12-01), Holloway et al.
patent: 4897781 (1990-01-01), Chang et al.
patent: 4901223 (1990-02-01), Rhyne
patent: 4929946 (1990-05-01), O'Brien et al.
patent: 4982324 (1991-01-01), McConaughy et al.
patent: 5005122 (1991-04-01), Griffin et al.
patent: 5018060 (1991-05-01), Gelb et al.
patent: 5089958 (1992-02-01), Horton et al.
patent: 5109515 (1992-04-01), Laggis et al.
patent: 5133065 (1992-07-01), Cheffetz et al.
patent: 5146568 (1992-09-01), Flaherty et al.
patent: 5155835 (1992-10-01), Belsan
patent: 5162986 (1992-11-01), Graber et al.
patent: 5163148 (1992-11-01), Walls
patent: 5210866 (1993-05-01), Milligan et al.
patent: 5218695 (1993-06-01), Noveck et al.
patent: 5239637 (1993-08-01), Davis et al.
patent: 5239647 (1993-08-01), Anglin et al.
patent: 5239659 (1993-08-01), Rudeseal et al.
patent: 5263154 (1993-11-01), Eastridge et al.
patent: 5276860 (1994-01-01), Fortier et al.
patent: 5276867 (1994-01-01), Kenley et al.
patent: 5278838 (1994-01-01), Ng et al.
patent: 5305389 (1994-04-01), Palmer
patent: 5317728 (1994-05-01), Tevis et al.
patent: 5325505 (1994-06-01), Hoffecker et al.
patent: 5347653 (1994-09-01), Flynn et al.
patent: 5355453 (1994-10-01), Row et al.
patent: 5367637 (1994-11-01), Wei
patent: 5367698 (1994-11-01), Webber et al.
patent: 5379418 (1995-01-01), Shimazaki et al.
patent: 5403639 (1995-04-01), Belsan et al.
patent: 5404508 (1995-04-01), Konrad et al.
patent: 5404527 (1995-04-01), Irwin et al.
patent: 5448718 (1995-09-01), Cohn et al.
patent: 5452440 (1995-09-01), Salsburg
patent: 5452454 (1995-09-01), Basu
patent: 5454099 (1995-09-01), Myers et al.
patent: 5479654 (1995-12-01), Squibb
patent: 5487160 (1996-01-01), Bemis
patent: 5497483 (1996-03-01), Beardsley et al.
patent: 5513314 (1996-04-01), Kandasamy et al.
patent: 5515502 (1996-05-01), Wood
patent: 5521597 (1996-05-01), Dimitri
patent: 5524205 (1996-06-01), Lomet et al.
patent: 5535407 (1996-07-01), Yanagawa et al.
patent: 5544320 (1996-08-01), Konrad
patent: 5559991 (1996-09-01), Kanfi
patent: 5574906 (1996-11-01), Morris
patent: 5586322 (1996-12-01), Beck et al.
patent: 5604862 (1997-02-01), Midgely et al.
patent: 5606719 (1997-02-01), Nichols et al.
patent: 5640561 (1997-06-01), Satoh et al.
patent: 5649196 (1997-07-01), Woodhill et al.
patent: 5659743 (1997-08-01), Adams et al.
patent: 5659747 (1997-08-01), Nakajima
patent: 5696901 (1997-12-01), Konrad
patent: 5742811 (1998-04-01), Agrawal et al.
patent: 5751936 (1998-05-01), Larson et al.
patent: 5754844 (1998-05-01), Fuller
patent: 5765173 (1998-06-01), Cane et al.
patent: 5771354 (1998-06-01), Crawford
patent: 5778395 (1998-07-01), Whiting et

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for unorchestrated determination of data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for unorchestrated determination of data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for unorchestrated determination of data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3304072

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.