Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-07-06
2002-06-18
Vu, Kim (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C711S112000, C711S161000, C709S200000
Reexamination Certificate
active
06408314
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention generally relates to the field of data processing and more particularly to high-performance sorting of data in computer systems.
2. Description of the Related Art
Prior art methods for sorting data sets which may be too large to fit within the memory of a computer generally involve a process of sort-merging, wherein data is sorted in units as large as available memory can handle (the “internal” sort), output as “strings”, and then the strings are “merged,” by successive operations if necessary, until the result is a single fully sorted string.
There is a vast body of technology which has developed in connection with sorting and merging. Much of this technology concerns improved methods for internally sorting, improved methods for merging, and improved methods of communicating with disk drives and other I/O devices during the sort merge process. Early work in this area in the context of disk-based sort-merge operations is reflected in the commonly assigned patent, U.S. Pat. No. 4,210,961.
Making the usual assumption that one will not be writing output over the input file, it follows that at a minimum, the sort-merge process involves copying the input to the output. Where the input resides in a disk file, part of this process is simply reading the entire input file from disk. However, relatively little of the prior research activity in this area has been directed to the manner of reading the input file.
Conventionally, the input file in a sort-merge process is accessed in accordance in the normal manner provided by the operating system, in which data is read from the disk in the logical order of file contents. The actual physical blocks of data on the disk corresponding to each file are not, however, generally stored in a contiguous or linear order. In practice, there is considerable physical discontinuity of recorded data blocks, both within individual files, and from file to file in a disk file system. Indeed, even if linearly recorded at the outset, the data blocks of files in a production computer system may become highly fragmented as blocks are read, revised and written over the course of normal usage. Even a newly created file may be fragmented if its data is larger than the next free spaces made available by the operating system. In normal operation, the operating system takes care of this, maintaining a directory which keeps track of the correspondence between the blocks of data that comprise a file, and the physical location of each block on the storage media. Yet in most operating systems the physical order of blocks is generally allowed to become discontinuous and fragmented.
The result of this disorder and fragmentation of raw disk data is that the process of reading files using normal operating system calls (or any other disk access methods that operate similarly) generally results in significant disk read head repositioning during the read operation. Since this mechanical movement can be the slowest operation on the computer, sometimes by orders of magnitude, reading a disk in this manner can be highly inefficient, and the delays involved can be significant, even compared to the time required to completely sort random file contents. A sort-merge job that is constrained to read the disk in this manner will thus necessarily suffer from this significant inefficiency. Considerable improvement in sort-merge operations can be obtained if this inefficiency can be overcome.
BRIEF SUMMARY OF THE INVENTION
Accordingly, it is an object of the present invention to provide an improved sort-merge method which avoids the inefficiencies of normal disk access during input. Among the objects of the present invention, therefore, are the following:
To reduce disk read time by reducing the amount of head repositionings necessary to read the sort input;
To achieve such reduction by performing sequential rather than random reads of the input file, to the extent feasible; and
Despite having read the file in a physical sequential order, being able to keep track of the logical sequence of blocks as well, so that aspects of the original record order can be maintained in the sorted output if so specified by the user.
The foregoing and other objects of the invention are accomplished by taking into consideration the fact that records can be read in any order if they are subsequently to be sorted. Thus, input from disk can be processed by reading the working directory maintained by the operating system to determine all of the blocks associated with the input data to be sorted. The data block identities so determined are sorted in accordance with their physical location on the disk, thereby providing a sequential order for reading. The input data is read in this sequential order, and then, using largely conventional methods, sorted into one or more strings and merged as necessary to form the fully sorted output. Since the original record order in the file is known from the working directory that has been read, that order can be utilized if and as necessary, for example to preserve the original order of records with equal keys.
The manner in which the invention achieves these objects is more particularly shown by the drawings enumerated below, and by the detailed description that follows.
REFERENCES:
patent: 4210961 (1980-07-01), Whitlow et al.
patent: 4686620 (1987-08-01), Ng
patent: 4760526 (1988-07-01), Takeda et al.
patent: 5133065 (1992-07-01), Cheffetz et al.
patent: 5150473 (1992-09-01), Zulch
patent: 5163148 (1992-11-01), Walls
patent: 5210866 (1993-05-01), Milligan et al.
patent: 5239647 (1993-08-01), Anglin et al.
patent: 5263154 (1993-11-01), Eastridge et al.
patent: 5276860 (1994-01-01), Fortier et al.
patent: 5276865 (1994-01-01), Thorpe
patent: 5321832 (1994-06-01), Tanaka et al.
patent: 5347653 (1994-09-01), Flynn et al.
patent: 5357607 (1994-10-01), Sathi et al.
patent: 5367669 (1994-11-01), Holland et al.
patent: 5367698 (1994-11-01), Webber et al.
patent: 5404508 (1995-04-01), Konrad et al.
patent: 5408654 (1995-04-01), Barry
patent: 5410694 (1995-04-01), Uchida et al.
patent: 5416840 (1995-05-01), Cane et al.
patent: 5448718 (1995-09-01), Cohn et al.
patent: 5475834 (1995-12-01), Anglin et al.
patent: 5479654 (1995-12-01), Squibb
patent: 5485608 (1996-01-01), Lomet et al.
patent: 5497483 (1996-03-01), Beardsley et al.
patent: 5513314 (1996-04-01), Kandasamy et al.
patent: 5519860 (1996-05-01), Liu et al.
patent: 5546536 (1996-08-01), Davis et al.
patent: 5557770 (1996-09-01), Bhide et al.
patent: 5559991 (1996-09-01), Kanfi
patent: 5574906 (1996-11-01), Morris
patent: 5586322 (1996-12-01), Beck et al.
patent: 5592665 (1997-01-01), Lahaije
patent: 5604862 (1997-02-01), Midgely et al.
patent: 5617566 (1997-04-01), Malcolm
patent: 5649196 (1997-07-01), Woodhill et al.
patent: 5659743 (1997-08-01), Adams et al.
patent: 5684991 (1997-11-01), Malcolm
patent: 5715452 (1998-02-01), Mori et al.
patent: 5720026 (1998-02-01), Uemura et al.
patent: 5737763 (1998-04-01), Hildtich
patent: 5761667 (1998-06-01), Koeppen
patent: 5765173 (1998-06-01), Cane et al.
patent: 5802344 (1998-09-01), Menon et al.
patent: 5819802 (1998-10-01), Marion
patent: 5829045 (1998-10-01), Motoyama
patent: 5842222 (1998-11-01), Lin et al.
What Is, “NTFS (NT file system),” http://www.whatis.com/htfs.htm, web site printout dated Apr. 15, 1999.
What Is, “cluster,” http://www.whatis.com/cluster.htm, website printout dated Apr. 15, 1999.
Via-Centrale Réseaux, “NTFS Documentation,” http//www.via.ecp.fr/~regis
tfs
ew, web site printout dated Apr. 14, 1999.
Via-Centrale Résaux, “NTFS Documentation: FILE record,” http://www.via.ecp.fr/~regis
fts
ew/FILE.html™inode, web site printout dated Apr. 15, 1999.
Via-Centrale Réseaux, “NTFS Documentation: Glossary,” http://www.via.ecp.fr/~regis
tfs
ew/glossary.html#File, web site printout dated Apr. 15, 1999.
Via-Centrale Résaux, “NTFS Documentation: $MFT,” http://www.via.ecp.fr/~regis
tfs
ew/MFT.html, web site printout dated Apr. 15, 1999.
Mark Russinovich, “Inside NTFS: NT's native file system
Abramson Ronald
Hughes Hubbard & Reed LLP
Hwang Joon Hwan
Sandridge Sheryl L.
Synscort Incorporated
LandOfFree
Method of performing a high-performance sort which gains... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of performing a high-performance sort which gains..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of performing a high-performance sort which gains... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2906819