Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or...
Reexamination Certificate
1999-02-03
2001-02-13
Kim, Kenneth S. (Department: 2783)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
C712S023000, C712S225000
Reexamination Certificate
active
06189088
ABSTRACT:
BACKGROUND
1. Technical Field
The present invention generally relates to computer processing systems and, in particular, to method and apparatus for reordering load operations in a computer program. The invention is applicable to operations reordered when the program is generated (static reordering) as well as to operations reordered at execution time (dynamic reordering).
2. Background Description
Contemporary high-performance processors rely on superscalar, superpipelining, and/or very long instruction word (VLIW) techniques for exploiting instruction-level parallelism in programs (i.e., for executing more than one instruction at a time). In general, these processors contain multiple functional units, execute a sequential stream of instructions, are able to fetch from memory more than one instruction per cycle, and are able to dispatch for execution more than one instruction per cycle subject to dependencies and availability of resources.
The pool of instructions from which the processor selects those that are dispatched at a given point in time is enlarged by the use of out-of-order execution. Out-of-order execution is a technique by which the operations in a sequential stream of instructions are reordered so that operations appearing later are executed earlier, if the resources required by the later appearing operations are free. Thus, out-of-order execution reduces the overall execution time of a program by exploiting the availability of the multiple functional units and using resources that would otherwise be idle. Reordering the execution of operations requires reordering the results produced by those operations, so that the functional behavior of the program is the same as what would be obtained if the instructions were executed in their original sequential order.
In the case of memory-related operations, a memory load operation reads a datum from memory, loads it in a processor register, and frequently starts a sequence of operations that depend on the datum loaded. Thus, in addition to using idle resources, the early (out-of-order) initiation of memory load operations may hide delays in accessing memory, including potential cache misses.
In general, there are two basic approaches to implementing out-of-order execution and reordering of results: dynamic reordering and static reordering. In dynamic reordering, the instructions are analyzed at execution time, and the instructions and results are reordered in hardware. In static reordering, a compiler/programmer analyzes and reorders the instructions and the results produced by those instructions when the program is generated, thus the reordering tasks are accomplished through software. These two approaches can be jointly implemented.
While significant research has been performed to support out-of-order execution in general, and between memory operations in particular, such research has primarily concentrated on uniprocessor execution. This in turn has focused on the ordering between load and (synchronous) store operations in a single instruction stream designed to execute on a single processor. This invention deals with the problem of asynchronous memory references typically found in multiprocessing environments, and their impact on reordering between multiple read operations from a single memory cell. While such transformations (i.e., reorderings) are safe in a strict uniprocessor environment, multiprocessor environments pose additional considerations such as, for example, the possibility of writes being performed by another processor with a distinct, unknown instruction stream.
To achieve predictable and repeatable computation of programs, a requirement of ‘sequential consistency’ is described in the article by L. Lamport, “How to Make a Multiprocessor that Correctly Executes Multiprocess Programs”, IEEE Transactions on Computers, C-28(9), pp. 690-91 (September 1979). The article by Lamport defines a multiprocessor system as sequentially consistent if “the result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program”. For static speculative execution, the order of the original logical program text is authoritative, not the reordered program text, and the compiler and hardware implementation must collaborate to generate an execution equivalent to that original order.
To achieve proper performance while simplifying coherence protocols between multiple processors in a system, several relaxations of the above described strictly sequential consistent order are possible. The types of re-ordering which are allowable depend on the memory consistency model guaranteed by a particular implementation. An overview of currently used and proposed consistency models and their characteristics is described in the article by S. Adve and K. Gharachorloo, “Shared Memory Consistency Models: A Tutorial”, Technical Report 9512, Dept. of Electrical and Computer Engineering, Rice University, Houston, Tex. (September 1995).
A basic requirement in all these relaxed models is write serialization to achieve memory coherence, i.e., all writes to the same location are serialized in some order and are performed in that order with respect to any processor. This is equivalent to sequential consistency described by Lamport wherein each memory cell is considered a memory module. We will refer to sequential consistency with respect to a single memory cell as write serial, so as to differentiate it from sequential consistency for a larger memory module. Memory coherence can be achieved by ensuring that successive load operations to the same memory location preserve a weakly ascending order of data items presented in that memory location. Thus, in a sequence of load operations, any load may present only the same or a later data item as its predecessors.
For example, consider a sequence of data items d1, d2, d3, d4, d5, d6 and so forth, written into a given memory location by a second processor. Successive load operations from that memory location by a first processor may present the same data item as returned by the first load operation, or a later item present in that memory cell. Thus, if the first load operation returned data item d2, then the second load operation may return data items d2, d3, d4 and so forth, but not data item d1. Alternatively, if the first load operation returned data item d4, then the second load operation may return data items d4, d5, d6 and so forth, but not data items d1, d2, or d3.
It is evident that in serial execution, this problem is resolved automatically by the nature of time moving forward. However, with respect to out-of-order processors, load operations which access the same memory location may get out-of-order such that a statically later load instruction would read a data item earlier in the sequence of data items than its statically preceding load instruction, which is executed at a later time.
One factor that limits the ability to reorder operations is ambiguous memory references. This is the case when a memory load operation appears after another memory load operation in a sequential instruction stream, and it is not possible to determine ahead of time whether the memory locations accessed by an out-of-order and an in-order load operation are different. For example, consider the following code fragment:
s=*(X+a*4+5)
u=s+4
t=*Y
v=t+8
wherein * denotes a memory access to the specified address, such that:
*Y indicates the memory location whose address is contained in Y; and
*(X+a*4+5) indicates the memory location whose address is specified by the expression X+a*4+5.
Assuming that a is a value stored in register r1 of a processor, X and Y are in registers r2 and r9, and s, t, u and v are assigned to registers r4, r5, r6 and r7, respectively, then the above code fragment can be represented by the following instruction sequence (wherein the first register after the name of the instruction is
F. Chau & Associates LLP
International Business Machines - Corporation
Kim Kenneth S.
LandOfFree
Forwarding stored dara fetched for out-of-order load/read... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Forwarding stored dara fetched for out-of-order load/read..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Forwarding stored dara fetched for out-of-order load/read... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2570994