Methods and systems for data management using multiple...

Data processing: database and file management or data structures – Data integrity – Fragmentation – compaction and compression

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S698000, C707S821000

Reexamination Certificate

active

07844581

ABSTRACT:
Systems and methods for data management and data processing are provided. Embodiments may include systems and methods relating to fast data selection with reasonably high quality results, and may include a faster data selection function and a slower data selection function. Various embodiments may include systems and methods relating to data hashing and/or data redundancy identification and elimination for a data set or a string of data. Embodiments may include a first selection function is used to pre-select boundary points or data blocks/windows from a data set or data stream and a second selection function is used to refine the boundary points or data blocks/windows. The second selection function may be better at determining the best places for boundary points or data blocks/windows in the data set or data stream. In various embodiments, data may be processed by a first faster hash function and slower more discriminating second hash function.

REFERENCES:
patent: 6263336 (2001-07-01), Tanaka
patent: 6658423 (2003-12-01), Pugh et al.
patent: 6810398 (2004-10-01), Moulton
patent: 7103602 (2006-09-01), Black et al.
patent: 7797323 (2010-09-01), Eshghi et al.
patent: 2005/0060643 (2005-03-01), Glass et al.
patent: 2005/0080823 (2005-04-01), Collins
patent: 2005/0091234 (2005-04-01), Hsu et al.
patent: 2005/0131939 (2005-06-01), Douglis et al.
patent: 2006/0047855 (2006-03-01), Gurevich et al.
patent: 2006/0059171 (2006-03-01), Borthakur et al.
patent: 2006/0112148 (2006-05-01), Jennings, III et al.
patent: 2006/0112264 (2006-05-01), Agarwal
patent: 2006/0235895 (2006-10-01), Rodriguez et al.
patent: 2007/0124415 (2007-05-01), Lev-Ran et al.
patent: 2007/0239947 (2007-10-01), Li et al.
patent: 2008/0005141 (2008-01-01), Zheng et al.
patent: 2008/0013830 (2008-01-01), Patterson et al.
patent: 2008/0025298 (2008-01-01), Lev-Ran et al.
patent: 2008/0034021 (2008-02-01), De Spiegeleer
patent: 2009/0228685 (2009-09-01), Wei et al.
Denehy et al., (IBM Research Report—Duplicate Management for Reference Data, IBM, 2003, 15 pages, accessed online at <http://domino.research.ibm.com/library/cyberdig.nsf/papers/9ADD5F942230D74585256E3500578D88/$File/rj10305.pdf> on Apr. 30, 2010. (Provided by Applicant).
Bobbarjung et al., “Improving Duplicate Eliminating in Storage Systems”, ACM Transactions on Storage, vol. V, No. N, Jul. 2006, 23 pages, accessed online at <http://www.cs.purdue.edu/homes/suresh/papers/acm-storage.pdf> on Apr. 30, 2010.
S. Annapureddy, M. Freedman, and D. Maziéres, “Shark: Scaling File Servers via Cooperative Caching” in NSDI '05 Paper [NSDI '05 Technical Program], (2005), pp. 129-142, http://www.usenix.org/events
sdi05/tech/full—papers/annapureddy/annapureddy.pdf.
J. Barreto and P. Ferreira, “A Replicated File System for Resource Constrained Mobile Devices” in Proceedings of IADIS International Conference on Applied Computing, (2004), pp. 1-9.
A. Broder, “Some Application of Rabin's fingerprinting method” in R. Capocelli, A. De Santis and U. Vaccaro (eds), Sequences II: Methods in Communications, Security, and Computer Science, (1993), pp. 1-10 (pp. 143-152 in book).
A.Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe, “Collection Statistics for Fast Duplicate Document Detection” in ACM Trans. Inf. Syst. 20, (2002), ISSN 1046-8188, pp. 171-191 http://www.ir.iit.edu/publications/downloads/p171-chowdhury.pdf.
T. Denehy and W. Hsu, “Duplicate Management for Reference Data”, Technical report RJ 10305, IBM Research (2003), pp. 1-14 http://domino.watson.ibm.com/library/cyberdig.nsf/papers/9ADD5F942230D74585256E3500578D88/%24File/rj10305.pdf.
F. Douglis and A. Iyengar, “Application-Specific Delta-encoding via resemblance Detection”, in Proceedings of the USENIX Annual Technical Conference (2003), pp. 1-23.
K. Eshghi and H. K. Tang, “A Framework for Analyzing and Improving Content-Based Chunking Algorithms”, Technical report HPL-2005-30R1, HP Laboratories (2005), pp. 1-10. http://www.hpl.hp.com/techreports/2005/HPL-2005-30R1.html.
G. Forman, K. Eshghi, and S. Chiocchetti, “Finding Similar Files in Large Document Repositories” in KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM Press, New York, NY, USA, (2005), pp. 394-400.
V. Henson and R. Henderson, “Guidelines for Using Compare-by-hash”, (2005), pp. 1-14.
N. Jain, M. Dahlin, and R. Tewari, “Taper: Tiered Approach for Eliminating Redundancy in Replica Synchronization”, Tech. Rep., Technical Report TR-05-42, Dept. of Comp. Sc., Univ. of Texas at Austin (2005), pp. 1-14, http://www.cs.utexas.edu/department/research/pubs.html.
R. Jain, “A Comparison of Hashing Schemes for Address Lookup in Computer Networks”, IEEE Transactions on Communications 40, 1570 (1992), pp. 1-5, http://citeseer.ist.psu.edu/jain92combarison.html.
P. Koopman, “32-Bit Cyclic Redundancy Codes for Internet Applications”, (2002), pp. 1-10.
P. Kulkarni, F. Douglis, J. LaVoie, and J. Tracey, “Redundancy Elimination Within Large Collections of Files” in Proceedings of the USENIX Annual Technical Conference (2004), pp. 1-14 http://scholar.google.com/url?sa=U&q=http://www.usenix.org/event/usenix04/tech/qeneral/full—papers/kulkarni/kulkarni—html/.
P. L'Ecuyer, Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure, in Math. Comput. 68, 249 (1999), ISSN 0025-5718, pp. 249-260 http://portal.acm.org/citation.cfm?id=307102&dl=acm&coll=&CFID=15151515&CFTOKEN=6184618#.
H. Lufei, W. Shi, and L. Zamorano, “On the Effects of Bandwidth Reduction Techniques in Distributed Applications”, Proceedings of International Conference on Embedded and Ubiquitous Computing (EUC'04) (2004), pp. 1-10.
A. Muthitacharoen, B. Chen and D. Mazieres, “A Low-bandwidth Network File System”, (2001), pp. 174-187.
C. Policroniades and I. Pratt, “Alternatives for Detecting Redundancy in Storage Systems Data”, in USENIX—04: Proceedings of the USENIX Annual Technical Conference (2004), pp. 1-14 http://www.usenix.org/events/usenix04/tech/general/full—papers/policroniades/policroniades.html/.
D. R. K. Ports, A. T. Clements, and E. D. Demaine, “PersiFS: A Versioned File System with an Efficient Representation”, in SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles, ACM Press, New York, NY, USA, (2005), pp. 1-2.
M. Rabin, “Fingerprinting by Random Polynomials” Technical report TR-15-81, Harvard University (2003), pp. 1-12.
S. Schleimer, D. S. Wilkerson, and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting”, in SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, ACM Press, New York, NY, USA, (2003), ISBN1-58113-634-X, pp. 1-10 http://citeseer.ist.psu.edu/schleimer03winnowing.html.
A. Spiridonov, S. Thaker, and S. Patwardhan, “Sharing and Bandwidth Consumption in the Low Bandwidth File System” in Tech. Rep., Department of Computer Science, University of Texas at Austin (2005), pp. 1-20 http://www.cs.utexas.edu/users/sahilt/research/LBFS.pdf.
J. Stone and M. Greenwald, “Performance of Checksums and CRCs over Real Data”, (1998), pp. 1-19.
J. Stone, R. Stewart and D. Otis, “Stream Control Transmission Protocol (SCTP) Checksum Change”, RFC 3309, The Internet Society Sep. 2002, pp. 1-17 http://www.faqs.org/ftp/rfc/pdf/rfc3309.txt.pdf.
T. Suel, P. Noel, and D. Trendafilov, “Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks” in ICDE '04: Proceedings of the 20th International Conference on Data Engineering, IEEE Computer Society, Washington, DC, USA, (2004), pp. 1-12.
A. Tridgell, “Efficient Algorithms for Sorting and Synchronization”, Ph.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and systems for data management using multiple... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and systems for data management using multiple..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and systems for data management using multiple... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4176362

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.