Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing
Reexamination Certificate
1996-12-17
2001-07-31
Banankhah, Majid A. (Department: 2151)
Electrical computers and digital processing systems: multicomput
Computer-to-computer data routing
Least weight routing
C709S241000, C711S120000, C711S121000, C711S133000
Reexamination Certificate
active
06269390
ABSTRACT:
The present invention relates to multiprocessing systems and, more particularly, to a method for scheduling the execution of processes among processors within a multiprocessing system to minimize data transfers between processor cache memories and thereby improve the efficiency of the multiprocessing system.
BACKGROUND OF THE INVENTION
Historically, a major goal of multiprocessor (MP) computers has been to use the processors to collectively reduce the latency of jobs. This is particularly true of computationally intensive jobs in scientific applications which can often be parallelized across the multiple processors very efficiently. Even in commercial MP servers, much effort has been expended to make operating systems and applications “multi-threaded” so as to parallelize work across multiple processors, decreasing the latency of the computations. However, for MP computers serving hundreds or thousands of simultaneous users, the vast majority of jobs are not computationally intensive, and the throughput of the server, as well as the latency of the individual jobs, can best be achieved by giving the each individual job “affinity” to a given CPU, so that all of the instructions and data associated with the individual job can remain readily accessible in a local cache memory.
One such method of scheduling processes is disclosed in U.S. Pat. No. 5,185,861 to Valencia, issued Feb. 9, 1993, and entitled “Cache Affinity Scheduler”. In that patent, an affinity scheduler for a multiprocessor computer system is disclosed. The affinity scheduler allocates processors to processes or jobs and schedules the processes to run based upon the bases of priority and processor availability. The scheduler uses the estimated amount of cache context to decide which run queue a process is to be enqueued. U.S. Pat. No. 5,185,861 is hereby incorporated by reference. Hereafter in this description, the method for allocating processors to processes will be referred to as a “process affinity scheduler”, indicating that it is the user process (or “job” or “thread”) that is given affinity to a processor.
It is believed that additional performance benefits can be gained by expanding the existing process affinity scheduler methods to encompass the idea of “data affinity scheduling”, wherein data is given affinity to a processor and jobs or processes that are manipulating that data extensively should be run on that processor. To appreciate this performance opportunity, it must first be noted that only one processor within a multiprocessor system can be modifying a shared memory location at any given time in order to maintain memory coherency between the processors and various cache memories. When more than one user job is manipulating the same data, the data modified by a first job affinitized to a first processor (CPU A) must be transferred following modification to the second job affinitized to a second processor (CPU B). This transfer can be time consuming depending on the communication mechanism between the processors, the amount of data to be transferred, and the contention for the communication resource between the processors.
The assertion that data affinity increases performance, at least in certain applications, may seem counter-intuitive because computations on data that were previously done in parallel are now serialized. However, as the number of processors in a system increases, so does the communication and synchronization overhead between the processors. This overhead increase is independent of whether the communication mechanism is a shared bus, such as used by shared memory multiprocessor systems, or multiple interconnection paths, such as used by massively parallel processing (MPP) computers, because the contention for the interconnection path increases as the number of processors and the number of simultaneous users increase. Higher bus contention results in longer access latencies for transferring data blocks between processors. In addition, when the processes need to synchronize execution via a semaphore operation, the overhead of acquiring and releasing a lock cell is much greater when its memory cell is being “thrashed” between two processors. By running both synchronizing processes on the same processor, not only is greater performance obtained by caching the semaphore locally, but a code-path reduction is achieved since it is now impossible for both processes to be attempting to acquire the lock simultaneously. It is therefore believed that, for certain classes of jobs, there is a cross-over point where the latency of jobs will be SHORTER by running them in serial on a processor where they can share the common data and synchronization structures locally than it would be by running the jobs in parallel on different processors where the application and synchronization data must be transferred over the interconnection path between the processors.
OBJECTS OF THE INVENTION
It is therefore an object of the present invention to provide a new and useful method for scheduling tasks within a multiprocessor computer system to improve system performance.
It is another object of the present invention to provide such a scheduling method which assigns processes or tasks which manipulate the same data to the same processor within a multiprocessor system.
It is still a further object of the present invention to provide a new and useful affinity scheduler for a multiprocessor system, wherein data is given affinity to a processor and jobs or processes that are manipulating that data extensively are scheduled to run on that processor.
SUMMARY OF THE INVENTION
There is provided, in accordance with the present invention, a method for assigning processes to processors within a multiprocessor computer system which includes a plurality of processors and cache memories associated with each processor. The method affinitizes processes to processors so that processes which frequently modify the same data are affined to the same processor—the processor whose cache memory includes the data being modified by the processes.
In the described embodiment, the method of the present invention represents an improvement to a traditional affinity scheduling system wherein a process previously executed on a first processor within a multiprocessor computer system is affined to the first processor and will be scheduled for execution by the first processor during subsequent requests for execution of the process. The improvement comprising the steps of: (1) maintaining a count of the number of times the process requests access to the cache memory associated with the first processor; (2) maintaining a count of the number of times the process requests access to the cache memory associated with a second processor; and changing the affinity of the process from the first processor to the second processor when the count of the number of times the process requests access to the cache memory associated with the second processor exceeds the count of the number of times the process requests access to the cache memory associated with the first processor.
The above and other objects, features, and advantages of the present invention will become apparent from the following description and the attached drawings.
REFERENCES:
patent: 5185861 (1993-02-01), Valencia
patent: 5261053 (1993-11-01), Valencia
patent: 5287508 (1994-02-01), Hejna, Jr. et al.
patent: 5317738 (1994-05-01), Cochcroft, Jr. et al.
patent: 5437032 (1995-07-01), Wolf et al.
patent: 5872972 (1999-02-01), Boland et al.
Overeinder et al. “A Dynamic Load Balancing System for Parallel Cluster computing” pp 101-115, May 1996.*
Michael J. Litzkow “Remote Unix Turning Idle Workstations into Cycle Servers” pp 301-304, May 1996.*
Radhika Thekkath and Susan J. Eggers—Impact of Sharing-Based Thread Placement on Multithread Architectures—IEEE, 1994.*
James Philbin, Suresh Jagannathan, Rajiv Mirani—Virtual Topologies: A New Concurrency Abstraction for High-Level Parallel Languages.*
James Philbin—An Overview of the Sting Operating System—NEC Software Conference, pp. 371-379, Dallas, Texas, O
Banankhah Majid A.
NCR Corporation
Stover James M.
LandOfFree
Affinity scheduling of data within multi-processor computer... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Affinity scheduling of data within multi-processor computer..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Affinity scheduling of data within multi-processor computer... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2495937