Patent
1996-06-20
1998-08-18
Black, Thomas G.
G06F 1730
Patent
active
057970009
ABSTRACT:
A method of performing a parallel join operation on a pair of relations R1 and R2 in a system containing P processors organized into Q clusters of P/Q processors each. The system contains disk storage for each cluster, shared by the processors of that cluster, together with a shared intermediate memory (SIM) accessible by all processors. The relations R1 and R2 to be joined are first sorted on the join column. The underlying domain of the join column is then partitioned into P ranges of equal size. Each range is further divided into M subranges of progressively decreasing size to create MP tasks T.sub.m,p, the subranges of a given range being so sized relative to one another that the estimated completion time for task T.sub.m,p is a predetermined fraction that of task T.sub.m-1,p. Tasks T.sub.m,p with larger time estimates are assigned (and the corresponding tuples shipped) to the cluster to which processor p belongs, while tasks with smaller time estimates are assigned to the SIM, which is regarded as a universal cluster (cluster 0). The actual task-to-processor assignments are determined dynamically during the join phase in accordance with the dynamic longest processing time first (DLPT) algorithm. Each processor within a cluster picks its next task at any given decision point to be the one with the largest time estimate which is owned by that cluster or by cluster 0.
REFERENCES:
patent: 4920487 (1990-04-01), Baffes
patent: 4956774 (1990-09-01), Shibamiya et al.
patent: 5091852 (1992-02-01), Tsuchida et al.
patent: 5121494 (1992-06-01), Dias et al.
patent: 5179699 (1993-01-01), Iyer et al.
patent: 5307485 (1994-04-01), Bordonaro et al.
patent: 5335345 (1994-08-01), Frieder et al.
patent: 5345585 (1994-09-01), Iyer et al.
patent: 5361362 (1994-11-01), Benkeser et al.
patent: 5392430 (1995-02-01), Chen et al.
patent: 5437032 (1995-07-01), Wolf et al.
Blasgen et al., "Storage and Access in Relational Data Bases", IBM Systems Journal, vol. 16, No. 4, 1977, pp. 363-377.
Chen et al., "Schema Integration and Query Decomposition in a Distributed Database System Using a Knowledge Based Approach", Information Modelling and Knowledge Bases III: Foundations, Theory and Applications, 1992, pp. 567-585.
Chen et al., "Two-Step Approach to Optimize Parallel Execution of Multi-Join Queries", IBM Technical Disclosure Bulletin, vol. 34, No. 10B, Mar. 1992, pp. 79-81.
Date, An Introduction to Database Systems, vol. 1, 4th Edition, 1986, pp. 132-136, 266-268, 341-343, 348-349.
DeWitt et al., "The Gamma Database Machine Project", IEEE Transactions on Knowledge and Data Engineering, vol. 2, No. 1., Mar. 1990, pp. 44-62.
Dias et al., "Methods for Improving the Efficiency of Parallel Sort Merge Joins in the Presence of Data Skew", IBM Technical Disclosure Bulletin, vol. 33, No. 10A, Mar. 1991, pp. 166-170.
Hummel et al., "Factoring: A Method for Scheduling Parallel Loops", Communications of the ACM, vol. 35, No. 8, Aug. 1992, pp. 90-101.
Kruskal et al., "Allocating Independent Subtasks on Parallel Processors", IEEE Transactions on Software Engineering, vol. SE-11, No. 10, Oct. 1985, pp. 1001-1016.
Murphy et al., "Effective Resource Utilization for Multiprocessor Join Execution", Proceedings of the 15th International Conference on Very Large Data Bases, 1989, pp. 67-75.
Murphy et al. "Processor Scheduling for Multiprocessor Joins", Fifth International Conference on Data Engineering, 1991, pp. 140-148.
Polychronopoulos et al., "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers", IEEE Transactions on Computers, vol. C-36, No. 12., Dec. 1987, pp. 1425-1439.
Schneider et al., "Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines", Proceedings of the 16th VLDB Conference, 1990, pp. 469-480.
Swami et al., "Online Algorithms for Load Balancing the Join Operation", IBM Technical Disclosure Bulletin, vol. 34, No. 7B, Dec. 1991, pp. 278-280.
Tseng et al., "Parallel Database Processing on the KSR1 Computer", SIGMOD Record, vol. 22, No. 2, Jun. 1993, pp. 353-455.
Tzen et al., "Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers", IEEE Transactions on Parallel and Distributed Systems, vol. 4, No. 1., Jan. 1993, pp. 87-98.
Walton et al., "Data Skew and the Scalability of Parallel Joins", Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, 1991, pp. 44-51.
Wolf et al., "A Parallel Sort Merge Join Algorithm for Managing Data Skew", IEEE Transactions on Parallel and Distributed Systems, vol. 4, No. 1, Jan. 1993, pp. 70-86.
Wolf et al., "An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew", Proceedings of the Seventh International Conference on Data Engineering, 1991, pp. 200-209.
Wolf et al., "An Effective Algorithm for Parallelizing Sort Merge Joins in the Presence of Data Skew", Proceedings of the Second International Symposium on Databases in Parallel and Distributed Systems, 1990, pp. 103-115.
Wolf et al., "Using a Surrogate Median to Speed Up the Execution of a Parallel Sort Merge Join Algorithm", IBM Technical Disclosure Bulletin, vol. 33, No. 9, Feb. 1991, pp. 215-217.
Bhattacharya Partha Pratim
Chung Jen-Yao
Pirahesh Mir Hamid
Selinger Patricia G.
Viveros Marisa S.
Black Thomas G.
International Business Machines - Corporation
Kinnaman, Jr. W. A.
Von Buhr Maria N.
LandOfFree
Method of performing a parallel relational database query in a m does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of performing a parallel relational database query in a m, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of performing a parallel relational database query in a m will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1125028