Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-12-15
2003-02-11
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06519593
ABSTRACT:
FIELD OF THE INVENTION
The present invention related generally to the efficient bundle sorting.
BACKGROUND OF THE INVENTION
External memory sorting is an extensively researched area. Many efficient in-memory sorting algorithms have been adapted for sorting in external memory such as merge sort, and much of the recent research in external memory sorting has been dedicated to improving the run time performance. Over the years, numerous authors have reported the performance of their sorting algorithms and implementations (cf [Aga96, BBW86, BGK90]). We note a recent paper [ADADC
+
97] which shows external sorting of 6 GB of data in under one minute on a network of workstations. For the problem of bundle sorting where k<N/B we note that our algorithm will reduce the number of I/Os that all these algorithms perform and can hence be utilized in benchmarks. We also consider a more performance-sensitive model of external memory in which rather than just counting the I/Os for determining the performance, there is a reduced cost for sequential I/Os compared to random access I/Os. We study the tradeoffs there, and show the adaptation in our bundle sorting algorithm to arrive at an optina algorithm in that model. We also note that another recent paper [ZL98] shows in detail how to improve the merge phase of the external merge sort algorithm, a phase that is completely avoided by using our in-place algorithm.
In the general framework of external memory algorithms, Aggarwal and Vitter showed a lower bound of &OHgr;((N/B)log
M/B
k(N/B)) on the number of I/Os needed in the worst case for sorting [AV88, Vit99]. In contrast, since our algorithm relies on the number k of distinct keys for its performance, we are able to circumvent this lower bound when k<<N/B. Moreover, we prove a matching lower bound for bundle sorting which shows that our algorithm is optimal.
Finally, sorting is used not only to produce sorted output, but also in many sort-based algorithms such as grouping with aggregation, duplicate removal, sort-merge join, as well as set operations including union, intersect, and except [Gra93, IBM95]. In many of these cases the number of distinct keys is relatively small and hence bundle sorting can be used for improved performance. We identify important applications for bundle sorting, but note that since sorting is such a common procedure, there are probably many more applications for bundle sorting that we did not consider.
SUMMARY OF THE INVENTION
Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting. We describe an efficient algorithm for bundle sorting in external memory that requires at most c(N/B)log
M/B
k disk accesses, where N is the number of keys, M is the size of internal memory, k is the number of distinct keys, B is the transfer block size, and 2<c<4. For moderately sized K this bound circumvents the &THgr;((N/B)log
M/B
(N/B)) I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a matching lower bound for bundle sorting. The improved running time of bundle sorting over general sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed “in-place”, requiring no additional disk space.
The present invention discloses a method of sorting data sets including a predetermined number of distinct keys. The method is comprised of, for example, two steps. The first step is comprised of bundling the data sets where substantially identical keys, having substantially identical key values, are bundled together. The second step is comprised of ordering the bundles in a predetermined order, with respect to the order defined by the substantially identical key values for each bundle. The method is performed preferably using external memory.
REFERENCES:
patent: 5855016 (1998-12-01), Edem et al.
Lars Arge et al, On Sorting Strings in External Memory, Proceedings of the twenty eighth annual ACM symposium on theory of computing, May′1997 (pp. 540-548).
Matias Yossi
Segal Eran
Donner Irah H.
Hale and Dorr LLP
Mizrahi Diane D.
Mofiz Apu M
LandOfFree
Efficient bundle sorting does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Efficient bundle sorting, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient bundle sorting will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3177790