Electrical computers and digital processing systems: virtual mac – Task management or control – Process scheduling
Reexamination Certificate
1999-10-07
2004-11-30
An, Meng-Al T. (Department: 2127)
Electrical computers and digital processing systems: virtual mac
Task management or control
Process scheduling
C718S100000, C707S793000, C707S793000
Reexamination Certificate
active
06826753
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to parallel execution of task by a set of processes, and in particular, dividing a task into subtasks and assigning the subtasks to processes.
BACKGROUND OF THE INVENTION
To fully utilize the computing power of a multi-processing system, a larger task (a “parent task”) may be divided into smaller tasks (“work granules”) which are then distributed to processes (“slave processes”) running on one or more processing nodes. Each node in a multi-processing system may contain multiple processors and multiple concurrent processes. The process that divides parent tasks into work granules and distributes the work granules to slave processes on the various processing nodes is referred to herein as the coordinator process.
Databases that run on multi-processing systems typically fall into two categories: shared disk databases and shared nothing databases. A shared disk database expects all disks to be visible to all processing nodes on the computer system on which the database runs. Consequently, a coordinator process in a shared disk database may assign any work granule to a slave process on any node, regardless of the location of the disk that contains the data that will be accessed during the work granule. Shared disk databases may be run on both shared nothing and shared disk computer systems. To run a shared disk database on a shared nothing computer system, software support may be added to the operating system or additional hardware may be provided to allow processes to have direct access to remote disks.
A shared nothing database assumes that a process can only access data if the data is contained on a disk that belongs to the same node as the process. Consequently, a coordinator process in a shared nothing database can only assign a work granule to a slave process if the data to be processed in the work granule resides on a disk in the same node as the process. Shared nothing databases may be run on both shared disk and shared nothing multi-processing systems. To run a shared nothing database on a shared disk machine, a mechanism may be provided for logically dividing the disks so that each of the disks is assigned to a particular node.
The power of database systems that run on multi-processing systems stems from the fact that many processors can be working in parallel on the same task. This power would be wasted if all work is assigned to a process running on a single node within the system, while processes on the other nodes remained idle. In general, the more evenly work is distributed, the greater the benefit derived from the parallelism provided by the system architecture. The more skewed the workload distribution, the less efficient the use of the multi-processing system. Ideally, work granules are distributed so all slave processes working on the work granules of the same parent task will complete their work granules at the same time.
There are numerous causes for skewed workload distributions. For example, in a shared disk database, a coordinator process may assign a work granule that may be performed more efficiently by a slave process on one node to a slave process on another node due to the location of data accessed by the work granule. Specifically, in a shared nothing computer system, the amount of time required for a process within a node to access data on a disk within the same node (a “local access”) is significantly less than the amount of time required for the same process to access data on a disk within another node (a “remote access”). However, under the assumption that processes on all nodes have equal access to all disks, the coordinator process in some shared databases may assign a work granule that accesses data in a first node to a slave process running on a second node even if a slave process on the first node is available to perform the work granule. Because a slave process that is executing an assigned work granule may have to remotely access data, the slave process may be executing the granule after other slave processes have completed execution of work granules assigned to the slave processes.
The process of managing work granules to reduce skewed work load distributions is referred to herein as work load skew management. One approach to work load skew management is referred to as the 9-3-1 technique. Under the 9-3-1 technique, work is divided among a set of slave processes into equal parts, one part for each slave process. Each part is then subdivided into work granules according to the ratio 9-3-1. For example, assume that an amount of work W is to be shared between three slave processes. The work is divided into three parts, one for each of the slave processes. Each part is further subdivided according to the ratio 9-3-1. Thus, work W is ultimately apportioned to a slave process as follows: 9/13*1/3 W, 3/13*1/3 W, and 1/13*1/3 W.
When the coordinator process assigns a work granule to a slave process, it first assigns the largest of the work granules for the slave process, i.e. the 9/13 portion. When a slave process completes a work granule, the coordinator process assigns to the slave process one of the remaining smaller work granules for the slave process. If a slave process completes all of its work granules, and another slave process has not completed its work granules, the coordinator “steals” the work granule from the busy slave process and assigns it to the slave process that has completed its own work granules. Consequently, both slave processes continue to perform work, and the work load distribution skew is reduced, if not avoided.
A disadvantage of the 9-3-1 technique is that it is usually difficult to both split a task into the dictated portions and split the task in a manner that accounts for other efficiency considerations. For example, a task AB entails access to data that is equally distributed at a location A and location B. Data at location A is locally accessible from a node A, data at location B is locally accessible from a node B. Three slave processes are available to execute the task. A process A
1
and A
2
executes on node A, a process B
1
executes on node B. To distribute the task according to the 9-3-1 technique, the largest of the work granules designated for process A
2
is assigned a portion of the task that entails access to data at location A and location B. Data at location B is remotely accessed by process A
2
, causing A
2
to execute its largest work granule much more slowly. The smaller work granules for process A
2
are redistributed to process A
1
and B
1
. However, due to slowness of the execution of process A
2
, the redistribution of work load does not significantly offset the disparity in execution times between process A
1
and B
1
, and process B
2
. Thus, in applying the 9-3-1 technique, not only were other efficiency considerations ignored, but as a result of ignoring them, the work distribution skew may have been actually worsened.
Another disadvantage of the 9-3-1 technique is in allocating work to a work granule that is substantially larger than other work granules, the technique puts “too many eggs in one basket”. If the largest work granule is executed slowly, the work load skew is not only created, but may be substantially worsened. This may occur even when work is divided in a manner that accounts for efficiency considerations, such as locality of data.
For example, a task C entails access to data, in the form of rows in a table that reside at a location A. Data at location A is locally accessible from a node A. Three slave processes on node A are available to execute task A, and are assigned work according to the 9-3-1 technique. The largest work granules are assigned to each of the three slave processes. The slave processes apply a filter to the rows they read. The rows that pass the filter are stored by the slave processes. For two of the slave processes, relatively few of the rows read by them pass the filter, and relatively few of the read rows are stored. However, for a third slave process, most rows read pass the filter. The third sla
Amor Patrick A.
Dageville Benoit
Bingham Marcel K.
Hickman Palermo & Truong & Becker LLP
Oracle International Corporation
Vo Lilian
LandOfFree
Managing parallel execution of work granules according to... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Managing parallel execution of work granules according to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Managing parallel execution of work granules according to... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3346213