Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or...
Reexamination Certificate
1995-09-26
2001-09-04
Chan, Eddie (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
C712S023000
Reexamination Certificate
active
06286095
ABSTRACT:
BACKGROUND
Early digital computers incorporated a single central processing unit (CPU) which was responsible for executing instructions in a program to perform some result. These uniprocessor systems generally have a very simple model of processor execution. That is, the CPU executes instructions in a way so that it appears to a programmer that the hardware is executing the instructions one at a time and in the order the instructions appear in the particular program being executed. This is called “program order” execution and is still in wide use today in personal computers etc.
To achieve greater speed, computer designers combine multiple CPUs in the same computer system. This allows the computer to execute faster as each CPU executes part of a program concurrently instead of a single CPU having to execute all of the program in sequence.
In order for the multiple CPU systems to work effectively, the CPUs must be able to communicate back and forth. Commonly, CPUs communicate through the use of shared memory that is accessible to each. One example of how shared memory can be used for communications is a “mailbox” scheme, which will be used here to illustrate the present invention. In multiple CPU computers, often one CPU is performing an operation where the result of the operation will be used by another CPU. One way the CPUs can communicate is by using a “mailbox” with a “flag”. That is, a location in memory is designated as the mailbox where a first CPU puts data for a second CPU to use. The second CPU reads a specified memory location to look for a particular bit pattern (the flag) which tells the second CPU that valid data is available in the mailbox for it to use. To understand the present invention, an understanding of how the mailbox process works is useful.
FIGS. 1 and 2
illustrate a mailbox communication technique between two CPUs.
FIG. 1
illustrates a section of program code processor
1
will execute
101
with two instructions detailed. The first instruction
103
causes processor
1
to store data in the mailbox location. While the second instruction
105
causes processor
1
to set the mailbox flag. A section of program code processor
2
will execute is shown
107
with four instructions detailed. The first instruction
109
causes processor
2
to load the mail flag and the second instruction
111
causes processor
2
to test the flag to see if it is set (indicating there is valid data in the mailbox). Branch instruction
113
causes processor
2
to loop back and reload the mail flag (instruction
109
) if the flag was not set (as determined by instruction
111
). If the flag was set, then processor
2
will continue past the branch instruction
113
and execute the next instruction
115
which causes processor
2
to load the data in the mailbox.
An example of the execution of the instructions shown in
FIG. 1
by processors
1
and
2
is illustrated in FIG.
2
. These are “sequentially-consistent” CPUs which means they execute all instructions in program order and no later operation is performed until all prior operations are complete. Processor
1
stores data in the mailbox (instruction.
103
) at time T
4
and then stores the mail flag (instruction
105
) at time T
5
. During the period indicated by time T
0
-T
3
, processor
1
would be executing other instructions not illustrated. In this example, processor
2
loads the mail flag (instruction
109
) at time T
0
and checks to see if the flag is set (instruction
111
) at time T
1
. The branch instruction (
113
) is executed at time T
2
and since the flag has not yet been set by processor
1
, the branch instruction causes processor
2
to branch to the load mail flag instruction
109
and reload the mail flag at time T
3
. The flag is checked again at time T
4
and since the flag is still not set yet the branch instruction causes processor
2
to branch back again to the load mail flag instruction which is executed at time T
6
. The flag is rechecked at time T
7
and since the flag was set by processor
1
, the branch instruction executed at time T
8
does not cause processor
2
to branch so processor
2
now loads the data in the mailbox (instruction
115
) at time T
9
.
In this example, processor
2
is assured of loading the proper mailbox data as processor
2
does not load from the mailbox until processor
1
has stored the mailbox data and set the flag.
While program order execution, as illustrated in
FIG. 2
, is still a useful execution method, it has limitations. Program order execution requires all instructions to be completed in order. Therefore if a preceding instruction cannot be completed, no later instruction can be performed. Unfortunately it is often that a particular instruction cannot be performed immediately. For example, an input/output (I/O) instruction may not be immediately executable because the I/O device is not ready with the required data. So while the CPU is waiting for the I/O device, a later instruction that could be executed must wait. In high performance computer systems such inefficiency is intolerable.
To address this bottleneck, computer designers have designed CPUs which can execute a later instruction before an unrelated prior instruction is competed. These CPUs do not maintain sequential consistency. While eliminating the constraints imposed by sequential consistency has performance benefits, it also creates problems in multiprocessor computers. For example, if a CPU is allowed to perform a later operation before all prior operations are complete, communications by using the mailbox and flag technique can fail.
A simple example of the multiprocessor mailbox problem is illustrated in
FIG. 3
showing the instructions detailed in
FIG. 1
being executed by CPUs which do not always execute instructions in a sequentially consistent way. Processor
1
performs two operations. The first operation
103
is to store data in the mailbox during time T
4
and the second operation is to store the mail flag
105
during time T
5
. Processor
2
loads the mail flag at time T
0
(
109
) and then checks to see if the mail flag is set
111
during time T
1
. However the mail flag will not be stored by processor
1
until T
5
and so the flag test performed by processor
2
will fail and the branch instruction
113
executed at time T
2
will cause processor
2
to loop back to the mail flag load. If the processor cannot immediately load the mail flag, for example it is not in cache but in main memory, the processor suspends execution of the mail flag load. However since processor
2
can perform the subsequent load of the mailbox data
115
it does so (the CPU “speculates” that this instruction will need to be performed later and therefore performs the load to save time). The CPU completes the reload of the mail flag at time T
4
and retests the flag at time T
5
. On this second try, processor
2
will find the flag set and therefore the branch instruction executed at time T
6
will not cause processor
2
to branch. Since processor
2
already completed the load of data from the mailbox during T
3
, the processor will not redo the load (it assumes its speculation was correct) but will continue on and execute the subsequent instruction
117
in the program. However, processor
2
will have read invalid data in the mailbox. This is an unacceptable result.
So some means must be made available to the programmer such that proper execution of a program can be assured. The programmer must be able to force the multiprocessor system to execute instruction sequentially and not perform a subsequent operation until a prior operation is complete. Prior art systems have resolved this problem by using a “barrier” instruction. This instruction prevents any operations after the barrier instruction from being performed before all operations preceding the barrier instruction are complete. It should be noted that although a load or store operation is begun in program order, common design practices can cause them to complete out of order. That is, a later started operation can complete before
Burger Stephen G.
Flahive Barry J.
Huck Jerome C.
Kurtze Jeff
Lee Ruby B. L.
Boyle Howard H.
Chan Eddie
Hewlett--Packard Company
Patel Gautam R.
Plettner David A.
LandOfFree
Computer apparatus having special instructions to force... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer apparatus having special instructions to force..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer apparatus having special instructions to force... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2441717