Optimized function execution for a multiprocessor computer...

Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S148000, C711S152000, C711S153000, C707S793000

Reexamination Certificate

active

06418517

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to computer operating systems. More particularly, this invention relates to the scheduling by an operating system of functions for execution on specified processors or specified nodes of a multiprocessor computer system.
BACKGROUND OF THE INVENTION
Multiprocessor computers by definition contain multiple processors (also referred to as CPUs) that can execute multiple parts of a computer program or multiple distinct programs simultaneously, in a manner known as parallel computing. In general, multiprocessor computers execute multithreaded-programs or single-threaded programs faster than conventional single processor computers, such as personal computers (PCs), that must execute programs sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a multithreaded-program or multiple distinct programs can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
Multiprocessor computers may be classified by how they share information among the processors. Shared-memory multiprocessor computers offer a common physical memory address space that all processors can access. Multiple processes or multiple threads within the same process can communicate through shared variables in memory that allow them to read or write to the same memory location in the computer. Message passing multiprocessor computers, in contrast, have a separate memory space for each processor, requiring processes in such a system to communicate through explicit messages to each other.
Shared-memory multiprocessor computers may further be classified by how the memory is physically organized. In distributed shared-memory computers, the memory is divided into modules physically placed near each processor. Although all of the memory modules are globally accessible, a processor can access memory placed nearby faster than memory placed remotely. Because the memory access time differs based on memory location, distributed shared memory systems are often called non-uniform memory access (NUMA) machines. By contrast, in centralized shared-memory computers, the memory is physically in one location. Centralized shared-memory computers are called uniform memory access (UMA) machines because the memory is equidistant in time from each of the processors. Both forms of memory organization typically use high-speed cache memory in conjunction with main memory to reduce execution time.
Multiprocessor computers with distributed shared memory are often organized into multiple nodes with one or more processors per node. The nodes interface with each other through a memory-interconnect network by using a protocol, such as the protocol described in the Scalable Coherent Interface (SCI)(IEEE 1596). UMA machines, in contrast, typically use a bus for interconnecting all of the processors.
Further information on multiprocessor computer systems in general and NUMA machines in particular can be found in a number of works including
Computer Architecture: A Quantitative Approach
(2
nd
Ed. 1996), by D. Patterson and J. Hennessy, which is hereby incorporated by reference.
A single operating system typically controls the operation of a multinode multiprocessor computer with distributed shared memory. Examples of suitable operating systems include UNIX-based operating systems such as DYNIX/ptx, BSD, SVR4, UnixWare, PC UNIX, etc. For general information on such operating systems, see Bach, M. J.,
The Design of the UNIX Operating System,
Prentice-Hall, 1986; Vahalia, U.,
Unix Internals: The New Frontier,
Prentice-Hall, 1996; McKusick, M., et al.,
The Design and Implementation of the
4.4
BSD Operating System,
Addison-Wesley, 1996, which are all hereby incorporated by reference.
One purpose of such an operating system is to schedule the execution of functions that perform periodic tasks. These periodic tasks include retransmitting dropped network packets, scheduling and managing memory, monitoring devices to avoid losing interrupts, polling devices that do not support interrupts, and remote but immediate execution of a function on a specified CPU or node. This scheduling is often done through a timeout mechanism such as the UNIX timeout system call that defers a specified function's execution for a specified time or a remote execution mechanism that specifies on which CPU or node a function will be immediately executed.
Prior operating systems for UMA multiprocessor computers have used a number of timeout mechanisms that defer a function's execution a specified number of system clock ticks. A relevant mechanism to this invention includes a global priority queue data structure in the centralized memory, such as shown in FIG.
1
. This data structure groups timeout handlers (timeout cells (tocells) A
0
, D
1
, F
0
, etc., that represent specified functions that are scheduled for execution) into “buckets.” All tocells in a given “future events” bucket are connected in a circular doubly-linked list and are scheduled to execute, or “fire,” on the same system clock tick, modulo the number of buckets. The array of buckets is indexed by the lower order bits of a system variable tonow that is incremented on each clock tick. All the tocells in the bucket after the one indexed by tonow are scheduled to fire on the next clock tick, modulo the number of buckets. On that next clock tick, these tocells are added to the end of the group of tocells in the “current events” bucket, where they are processed by any processor (CPU) that is currently available. The tocell is identified by letter (which represents a pointer to the memory containing the cell), and the associated number indicates the number of clock ticks still remaining after the tocell is placed in the current events bucket. If the number is zero, then processing the tocell will cause it to fire. If the number is nonzero, such as two or three, the CPU processing the tocell will place it in the proper future events bucket two or three ahead of the one pointed to by the index, with the number reset to zero. Thus in
FIG. 1
when tocell C
2
is placed on the current events list, it will not fire but will be moved two future events buckets ahead, to the bucket that contains D
1
and E
0
, and its count set to zero.
Access to the global priority queue is controlled by a global “gate” (also known as a “lock”) a system variable that is set by a CPU to indicate that the queue is currently being accessed. This lock ensures that only one CPU at a time can access the future events buckets through the timeout system call to add or remove tocells, thereby preventing data corruption that would otherwise occur if multiple CPUs attempted to schedule functions simultaneously.
Although this timeout mechanism has served adequately for UMA machines, it has a number of drawbacks. The use of a global lock results in poor system behavior during overload, i.e., when multiple CPUs are simultaneously attempting to schedule functions. The global lock also severely limits the scalability of the timeout mechanism. That is, the performance of the timeout mechanism degrades as the number of CPUs in the computer increases. And in certain circumstances, a sequence of interrupts can cause multiple instances of a scheduled specified function for a tocell to be executing concurrently. This can be avoided only by using a much more complicated locking design for the lock.
An objective of the invention, therefore, is to provide a method and means for scheduling functions for execution on a multiprocessor computer without the delay inherent in prior approaches. Another objective of the invention is to provide such a method and means that permits functions to be scheduled for execution on a specified CPU (which could be the scheduling CPU or another CPU) or on a specified node of a multiprocessor computer. Yet another objective of the invention is to provide such a method and means that balances CPU loads by directing the execution of functions from overloaded CPUs to availa

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Optimized function execution for a multiprocessor computer... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Optimized function execution for a multiprocessor computer..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimized function execution for a multiprocessor computer... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2829126

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.