Partitioning of sorted lists (containing duplicate entries)...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06366911

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
This invention generally relates to the use of multiple processors to merge sorted lists, and more specifically relates to the efficient partitioning of sorted lists containing duplicate entries making use of all the processors during the final sort and merge of the lists.
2. Background Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware (e.g., semiconductors, circuit boards, etc.) and software (e.g., computer programs). As advances in semiconductor processing and computer architecture push the performance of the computer hardware higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
One way in which computers have become more powerful is in their ability to work synchronously in performing tasks. For instance, sorting large volumes of data into an order of ascending or descending values can be done faster when using multiple processors, or “multiprocessors,” and has been the subject of much study. It is a relatively simple process to initially divide the list of N elements into P nearly equal sized lists for each of P processors to individually sort. Once these lists are sorted, they must be merged together into one list. It is desirable that the time required to perform this merge be linear with respect to the number of processors, (i.e. 2 processors yield twice the performance of 1 processor, 4 processors yield quadruple the performance of 1 processor). In other words, with N total elements to be merged using P processors, the time to merge should be proportional to N/P. This is where most of the prior art has run into difficulties. The time to merge using the most efficient prior art method is proportional to N/P*log
2
P. To illustrate, consider what happens to merge performance as a computer system is upgraded from 2 to 4 processors. If the merge performance is linear then the merge time goes from N/2 to N/4 for a twofold performance increase. If the merge performance is N/P*log
2
P then the merge time goes from N/2*log
2
2=N/2*1=N/2 to N/4*log
2
4=N/4*2=N/2 for no performance increase even though two additional processors were used.
One reference, Akl, S. G., and Santroo, N., “Optimal Parallel Merging and Sorting Without Memory Conflicts” IEEE Trans. on Computers, Vol. C36, No. 11, November 1987, pp. 1367-69, describes a system that can work on two lists at a time, finding the value of the element where the sum of the number of elements in the two lists below the value is equal to the sum of the number of elements in the two lists above the value. It is then a simple matter to partition the two lists at that value, and one processor sort the combined lists below the partition, while the other processor sorts the combined lists above the partition. The two lists are then concatenated, or simply strung together to form one new list sorted in the correct order. If there are more than two lists, the process is done simultaneously for each pair of lists, then one is left with half the number of lists. These lists are partitioned again to generate two pairs of lists as above, but, since the number of lists to be merged is now only half the original quantity, each of the two pairs of lists must be partitioned yet another time so that enough pairs of lists are available to keep all the processors busy. A pair of lists is first partitioned in half and then each half is partitioned to yield a one-quarter and three-quarter partition so that each processor has a pair of lists to merge. This clearly is inefficient since multiple merge phases are necessary to generate the final sorted list. If it were possible to partition all the lists at once into P partitions such that each processor could perform only a single merge phase, then a significant performance increase would result and the desired sorted list would be generated by simple concatenation of the merged partitions. This has been attempted by partitioning based on sample sort keys.
Another reference, Quinn, M. J., “Parallel Sorting Algorithms for Tightly Coupled Multi-processors”, Parallel Computing, 6, 1988, pp. 349-367., chooses partitioning key values from the first list to be merged such that these keys will partition the first list into P partitions. These keys are then used to partition the remaining lists as close as possible to the sample key values. This leads to a load imbalance among processors due to the approximated partitions and a corresponding degradation from linear merge performance. Furthermore, if the data in the lists is skewed from random distribution (a common occurrence in relational data base operations), then the resulting approximate partitions can be greatly differing in size thereby exacerbating the load imbalance problem. Even with randomly distributed data the literature indicates that no performance is gained beyond 10 processors when sorting 10,000 records. This is attributed to load imbalances from approximate partitioning.
The prior art has succeeded in efficiently partitioning any number of sorted lists, when the lists do not contain duplicate entries. U.S. Pat. No. 5,179,699, Iyer et al, Jan. 12, 1993 “Partitioning Of Sorted Lists For Multiprocessors Sort and Merge,” teaches such a method. When given a large list to sort, the list is initially divided amongst available processors. Each processor sorts one of these lists. The sorted lists are then partitioned to create new lists such that each of the elements in the new lists have values no smaller than any of the elements in the partitions before it, nor larger than any of the elements in the partitions following it. The lists in each partition are then simply strung together to provide a sorted list of all the elements. Maximum use of processors is obtained, and hence a near linear improvement in sort time is obtained when adding further processors.
The Iyer method works fine when all of the elements in the sorted lists are unique. After the list to be sorted is divided between “x” processors and individually sorted by each one, “x−1” processors then runs a partitioning process to create “x−1” partition boundaries, or “x” partitions. Each processor performs the same partition process on the sorted lists. What differs amongst the processors is the percentage of elements that the processor assigns to the upper and lower partitions.
For instance, given 4 sorted lists, 3 partition boundaries are drawn to create 4 partitions. Three processors then create partition boundaries: the first assigns ¼ of the elements to the upper boundary and the rest to the lower; the second assigns ½ of the elements to the upper boundary and the rest to the lower; and the third pro cessor assigns ¾ of the elements to the upper boundary and the rest to the lower. In this way, the Iyer method creates 4 partitions which then can be concatenated together to produce one final sorted list.
The Iyer method works inconsistently if duplicate entries are contained in the sorted lists. Sometimes the method works, but there are cases where duplicate elements are inserted into more than one partitioned list. Essentially what happens is that the partition boundaries cross, and the elements where the partition boundaries cross are counted more than once. Crossing of boundaries occurs when the partitioning process running separately on each processor includes elements in more than one partition. For instance, the partition boundaries might be drawn such that both partitions
1
and
2
contain the 6th element in the first list.
Whether the Iyer method works to sort lists given duplicate entries depends upon the location of the duplicate entries and the number o

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Partitioning of sorted lists (containing duplicate entries)... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Partitioning of sorted lists (containing duplicate entries)..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Partitioning of sorted lists (containing duplicate entries)... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2816501

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.