Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1997-06-27
2001-10-16
Banankhah, Majid (Department: 2151)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06304866
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to multiprocessing computer database systems. More particularly, the invention concerns a method, apparatus, and article of manufacture for performing an aggregate database processing task with multiple concurrently operating task execution units, using incremental and on-demand sub-task allocation. This is also referred to as “straw model” data processing.
2. Description of the Related Art
Most database tasks cannot be judged by their accuracy alone. Task completion time is of paramount importance to purchasers, testers, developers, and other users of computer programs. The time to complete a task or sub-task has a number of names, including throughput, response time, execution time, and task completion time. By any name, engineers and scientists are constantly seeking new ways to minimize their applications' task completion time.
Reducing task completion time is especially challenging in “multiprocessing environments”, which refer to computer systems where multiple computers, digital data processing units, threads, or other processes perform concurrent tasks. For ease of discussion, the term “program execution threads” is used to refer to these and other arrangements, where some or all of hardware device performs a specific task by executing an instruction stream. “Threads” is a known shorthand term. In multi-thread systems, one important consideration is how to apportion the work among the various concurrently operating threads. Typically, workload is apportioned by somehow logically dividing the data to be processed. For example, a block of data might be divided evenly into a number of parts equal to the number of available threads, so that each thread can independently process a separate portion of the data. This is called “static” or “a priori” partitioning.
Such apportionment of work is rarely completely balanced. One reason is that the extent of the data may be difficult to ascertain, or the data's processing time may be otherwise difficult to predict. Furthermore, even equisized data subparts may require comparatively different processing times. As a result of this lack of balance, some threads finish early, and other threads labor for longer periods. Still, the overall job is not complete until all threads have finished, including the last to finish its respective task. Thus, even though some threads complete their work early, other threads linger on, delaying completion of the ultimate task. One name for this phenomenon is called “workload skew”. For users seeking faster execution of their multi-thread application programs, workload skew is a significant problem area.
SUMMARY OF THE INVENTION
Broadly, the present invention concerns the performance of an aggregate database processing task in a multiprocessing computer database system. The aggregate task is executed with multiple concurrently operating task execution units, using incremental and on-demand sub-task allocation. A command is received to perform a machine-executed task divisible into multiple sub-tasks, i.e., an “aggregate task”. Then, multiple task execution units are independently operated on-demand to sequentially self-allocate and execute sub-tasks of the aggregate task. Each sub-task is sized according to a specific “granularity”, used to divide the aggregate task into sub-tasks. The granularity may be predetermined for all task execution units or repeatedly dynamically determined by each task execution unit at the time of self-allocation. Preferably, the granularity is not too large as to cause potentially uneven sub-task allocation, and not so small as to incur excessive overhead in allocating sub-tasks.
Operating “on-demand”, each task execution unit sequentially allocates and executes one sub-task at a time, then proceeding to allocate the next unallocated pice of work to itself according to the appropriate granularity. Thus, the multiprocessing system operates like multiple people drinking water from a common glass through individual straws—although each drinker works independently, all finish simultaneously, thus completing the aggregate task as expeditiously as possible.
In one implementation, each task execution unit operates as follows. First, the task execution unit selects an unexecuted sub-task of the aggregate task, where the sub-task is sized according to the appropriate granularity. Then, the task execution unit executes the selected sub-task. After executing the sub-task, the task execution units determines whether any available work of the aggregate task remains. As long as available work of the aggregate task remain, the selection and allocation steps are repeated.
In one embodiment, the invention may be implemented to provide a method for machine-executed completion of an aggregate task. In another embodiment, the invention may be implemented to provide an apparatus including a multiprocessing system for machine-executed completion of an aggregate task. In still another embodiment, the invention may be implemented to provide a programmed product comprising signal-bearing media tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform method steps for machine-executed completion of an aggregate task.
The invention affords its users with a number of distinct advantages. Chiefly, the invention guides a multiprocessing system to perform an aggregate task as quickly as possible, with minimum workload skew among independently operating processing elements. Furthermore, the straw model is particularly well suited to subsequent database operations such as the nested loop join, merge join, hash join, group-by, index and-ing/or-ing, etc. The invention also provides a number of other advantages, which should be apparent from the following description of the invention.
REFERENCES:
patent: 5857180 (1999-01-01), Hallmark et al.
patent: 5903888 (1999-05-01), Cohen et al.
patent: 5918225 (1999-06-01), White et al.
patent: 5963948 (1999-10-01), Shilcrat
patent: 6115705 (2000-09-01), Larson
Bezenchek et al, “A data structure for representing aggregate data”, IEEE, pp. 22-31, 1996.*
Shatdal et al, “Adaptive parallel aggregation algorithms”, 1995.*
Shatdal et al, “Processing inparallel database systems”, 1994.*
Oracle, “Oracle8: The hub of Oracle warehouse: an Oracle technical white paper”, Jun. 1997.
Chow Jyh-Herng
Hornibrook John Frederick
Lindsay Bruce Gilbert
Peddle Geoffrey
Shekita Eugene Jon
Baker Maxham Jester & Meador
Banankhah Majid
International Business Machines - Corporation
LandOfFree
Aggregate job performance in a multiprocessing system by... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Aggregate job performance in a multiprocessing system by..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Aggregate job performance in a multiprocessing system by... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2604173