Data processing: database and file management or data structures – Database design – Data structure types
Patent
1996-06-28
2000-09-12
Au, Amelia
Data processing: database and file management or data structures
Database design
Data structure types
707 6, 707 7, 707 3, G06F 1730
Patent
active
061191205
ABSTRACT:
A method for constructing a data structure for a data string of characters includes producing a matrix of sorted rotations of the data string. This matrix defines an A array which is a sorted list of the characters in the data string, a B array which is a permutation of the data string, and a correspondence array C which contains correspondence entries linking the characters in the A array to the same characters in the B array. A reduced A' array is computed to identify each unique character in the A array and a reduced C' array is computed to contain every s.sup.th entry of the C array. The B array is segmented into blocks of size s. During a search, the A' and C' arrays are used to index the B array to reconstruct any desired row from the matrix of rotations. Through this representation, the matrix of rotations can thus be used as a conventional sorted list for pattern matching or information retrieval applications. A data structure containing only the A', B, and C' has very little memory overhead. The B array contains the same number of characters as the original data string, and can be compressed in a block wise manner to reduce its size. The A' array is a fixed size equal to the size of the alphabet used to construct the data string, and the C' array is variable size according to the relationship n/s, where n is the number of characters in the data string and s is the size of the blocks of the B array. Accordingly, the data structure enables a tradeoff between access speed and memory overhead, the product of which is constant with respect to block size s.
REFERENCES:
patent: 5459739 (1995-10-01), Handley et al.
"Dynamic Programming Alignment of Sequences Representing Cyclic Patterns", by Jens Gregor and Michael G. Thomason, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, No. 2, pp. 129-135, Feb. 1993.
"Searching Genetic Databases on Splash 2", by Dzung T. Hoang, Proceedings IEEE Workshop on FPGAs for Custom Computing Machines (Cat. No. 93TH0535-5), pp. 185-191, Apr. 5, 1993.
"Rapid-2, An Objecti-Oriented Association Memory Applicable to Genome Data Processing", by Denis Archambaud, Pascal Faudemay, and Alain Greiner Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, pp. 150-159, Jan. 1994.
"A Faster Algorithm Computing String Edit Distances", William J. Masek and Michael S. Paterson, Journal of Computer and System Sciences, 20, pp. 18-31, Aug. 6, 1979.
"Synthesis and Recognition of Sequences", by S.C. Chan and A.K.C. Wong, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, No. 12, pp. 1245-1255, Dec. 1991.
"Efficient Systolic String Matching", by G.M. Megson, Electronic Letters, vol. 26, No. 24, pp. 2040-2042, Nov. 1990.
Au Amelia
Frederick, II Gilberto
Microsoft Corporation
LandOfFree
Computer implemented methods for constructing a compressed data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer implemented methods for constructing a compressed data , we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer implemented methods for constructing a compressed data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-105167