Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing
Reexamination Certificate
1999-06-25
2002-05-21
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Scoreboarding, reservation station, or aliasing
C712S219000, C712S240000, C712S225000, C711S123000, C711S125000, C711S132000, C711S169000, C711S144000
Reexamination Certificate
active
06393553
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to data processing systems and in particular to a method and system for ordering load instructions. Still more particularly, the present invention relates to a method and system for implementing just-in-time delivery of data requested by load instructions.
2. Description of the Related Art
In conventional symmetric multiprocessor (SMP) data processing systems, all of the processors are generally identical. The processors all utilize common instruction sets and communication protocols, have similar hardware architectures, and are generally provided with similar memory hierarchies. For example, a conventional SMP data processing system may comprise a system memory, a plurality of processing elements that each include a processor and one or more levels of cache memory and a system bus coupling the processing elements to each other and to the system memory.
Conventional SMP data processing system processors have a number of execution units. Superscalar multiprocessors typically have more than one of each execution unit. They typically have two floating point units (FPUs), two fixed point units (FXUs) and two load/store units (LSUs). The processors are designed for high frequency and their corresponding internal caches are typically very small in order to operate with the high frequency processor. In part due to their relatively small size, these internal caches sustain a large number of cache misses during requests for data. Data is thus stored in lower level (L
2
) caches to maximize processing speed. The processors typically send multiple load requests simultaneously or within close proximity to each other. This is particularly true in superscalar processors with multiple LSUs.
Traditionally, processors execute program instructions in order. With state-of-the-art processors, out-of-order execution of instructions is often employed to maximize the utilization of execution unit resources within the processor, thereby enhancing overall processor efficiency. Further, in these state-of-the-art processors that support out-of-order execution of instructions, instructions may be dispatched out of program order, executed opportunistically within the execution units of the processor, and completed in program order. The performance enhancement resulting from out-of-order execution is maximized when implemented within a superscalar processor having multiple execution units capable of executing multiple instructions concurrently.
Processors today often run numerous cycles ahead of the instruction stream of the program being executed. Also, on these processors, load instructions are issued as early as possible in order to “hide” the cache access latencies and thus allow ensuing dependent load instructions to execute with minimal delay.
Additionally, compilers separate load instructions from their data dependency. For similar reasons, these techniques lead to requests for data which may not be required immediately.
Finally, an L
2
cache has a limited amount of wired connections for returning data. When data is sent prior to the time it is required, it utilizes valuable wired cache line resources which may be required for more immediate or important data requests.
In the prior art load instructions may be issued out of order. Often times this results in a load queue occupying valuable cache line resources or register space for many cycles before it is utilized by the program. When a large number of load instructions are present this results in loading down the critical cache and queue resources resulting in less efficient processing.
When the data cache is “bombarded” with load requests, no ordering information is included. The data cache is oblivious as to which load instruction to process and in which order. In traditional processors, ordering information is typically implied based on a “First Come First Serve” prioritization scheme.
However, in some cases data is often not required by the processor or program at the time, or in the order, it is requested.
Thus many hardware and software limitations exist in the current method of loading data from a data cache. It is obvious that a more efficient means of loading data from a data cache needs to be developed. A processor should be able to issue its data requests so that the data cache can more optimally deliver the data only when it is actually required.
It would therefore be desirable to provide a method and system for improving the efficiency of load instruction processing and subsequent loading of data. It is further desirable to provide a method and system which allows for just-in-time delivery and/or time-ordered delivery of data during execution of an instruction set thus allowing data to be loaded from a data cache at the time when needed within the program execution stream.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide an improved data processing system.
It is another object of the present invention to provide an improved method and system for efficiently managing multiple load requests to a data cache.
It is yet another object of the present invention to provide a method and system for implementing just-in-time delivery of data requested by load instructions.
The foregoing objects are achieved as is now described. A system for time-ordered execution of load instructions is disclosed. More specifically, the system enables just-in-time delivery of data requested by a load instruction. The system consists of a processor, an L
1
data cache with corresponding L
1
cache controller, and an instruction processor. The instruction processor manipulates an architected Time Dependency Field (TDF) of a load instruction to create a Distance of Dependency (DoD) bit field. The DoD bit field holds a relative dependency value which is utilized to order the load instruction in a Relative Time-Ordered Queue (RTOQ) of the L
1
cache controller. The load instruction is sent from RTOQ to the L
1
data cache at a particular time so that the data requested is loaded from the L
1
data cache at the time specified by the DoD bit field. In the preferred embodiment, an acknowledgement is sent to the processing unit when the time specified is available in the RTOQ.
The above as well as additional objects, features, and advantages of an illustrative embodiment will become apparent in the following detailed written description.
REFERENCES:
patent: 5214765 (1993-05-01), Jesen
patent: 5278985 (1994-01-01), Odnert et al.
patent: 5307477 (1994-04-01), Taylor et al.
patent: 5404484 (1995-04-01), Schlansker et al.
patent: 5694574 (1997-12-01), Abramson et al.
patent: 5717882 (1998-02-01), Abramson et al.
patent: 5737565 (1998-04-01), Mayfield
patent: 5758051 (1998-05-01), Moreno et al.
patent: 5761515 (1998-06-01), Barton et al.
patent: 5809275 (1998-09-01), Lesartre
patent: 5895495 (1999-04-01), Arimilli et al.
patent: 5964867 (1999-10-01), Anderson et al.
patent: 5987594 (1999-11-01), Panwar et al.
patent: 5999727 (1999-12-01), Panwar et al.
patent: 6006326 (1999-12-01), Panwar et al.
patent: 6052775 (2000-04-01), Panwar et al.
patent: 6058472 (2000-05-01), Panwar et al.
patent: 6065101 (2000-05-01), Gilda
patent: 6092180 (2000-07-01), Anderson et al.
patent: 6145059 (2000-11-01), Arimilli et al.
U.S. application No. 09/338,946, Arimilli et al., filed Jun. 25, 1999.
U.S. application No. 09/344,057, Arimilli et al., filed Jun. 25, 1999.
U.S. application No. 09/344,061, Arimilli et al., filed Jun. 25, 1999.
U.S. application No. 09/344,058, Arimilli et al., filed Jun. 25, 1999.
U.S. application No. 09/344,023, Arimilli et al., filed Jun. 25, 1999.
Arimilli Lakshminarayanan Baba
Arimilli Ravi Kumar
Dodson John Steven
Lewis Jerry Don
Bracewell & Patterson L.L.P.
International Business Machines - Corporation
Pan Daniel H.
Salys Casimer K.
LandOfFree
Acknowledgement mechanism for just-in-time delivery of load... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Acknowledgement mechanism for just-in-time delivery of load..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Acknowledgement mechanism for just-in-time delivery of load... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2891549