Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or...
Reexamination Certificate
1999-05-21
2004-04-27
Patel, Gautam R. (Department: 2655)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
C712S225000, C711S122000
Reexamination Certificate
active
06728867
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to methods for processing load operations, and in particular to methods for processing load operations prior to store operations that may target overlapping memory addresses.
2. Background Art
Currently available processors are capable of executing instructions at very high speeds. These processors typically implement pipelined, superscalar micro-architectures that can execute multiple instructions per clock cycle at clock frequencies approaching one gigahertz or more. In recent years, the instruction executing capabilities of processors have begun to outstrip computer systems' capacities to provide instructions and/or data for processing.
One bottleneck in supplying the processor with data/instructions is the relatively long latency of the load operations that transfer data from the computer's memory system into the processor's registers. A typical memory system includes a hierarchy of caches, e.g. L
0
, L
1
, L
2
. . . , and a main memory. The latency of the load depends on where in the hierarchy the targeted data is found, i.e. the cache in which the load operation “hits”. For example, a load hit in the L
0
cache may have a latency of 1 to 2 clock cycles. Load hits in the L
1
or L
2
caches may have latencies of 4 to 8 clock cycles or 10 or more clock cycles, respectively. If the data is only available from main memory, the load latency can be on the order of 100-200 clock cycles.
To avoid idling the processor, a compiler typically schedules load operations in a program flow well before the operation that uses the target data. Compiler scheduling occurs before the program is executed and, consequently, before any run-time information is available. As a result, store operations, which transfer data from the processor's registers into the memory system, can limit this load-scheduling strategy. If a compiler moves a load that returns data from a specified memory address ahead of a store that writes data to the same memory address, the load will return stale data. As long as the compiler can determine the memory addresses specified by the load and store from available information, it can determine whether it is safe to move the load ahead of the store. The process of identifying memory addresses to determine overlap is referred to as memory disambiguation.
In many instances, it is not possible to disambiguate memory references at the time the corresponding load and store operations are scheduled. For example, the memory address referenced by an operation may depend on variables that are determined at run-time, just before the operation is executed. For load/store pairs that can not be disambiguated at compile time, certain advanced compilers can still reschedule the load ahead of the store using an “advanced load”. In an advanced load, the load operation is scheduled ahead of a potentially conflicting store operation, and a check operation is inserted in the instruction flow, following the store operation. The load and store memory references are resolved when the corresponding instructions are executed. The check operation determines whether these dynamically-resolved memory references overlap and initiates a recovery procedure if the resolved memory references overlap.
The instruction movement that accompanies an advanced load operation is illustrated by the following instruction sequence, where LOAD, STORE, ALOAD, and CHECK represent the load, store, advanced load, and check operations, and x and y represent the undisambiguated memory references.
WITHOUT ADVANCED LOADING
WITH ADVANCED LOADING
INSTRUCTION A
ALOAD reg2, mem[y]
.
INSTRUCTION A
.
.
.
.
INSTRUCTION B
.
STORE reg1, mem[x]
INSTRUCTION B
LOAD reg2, mem[y]
STORE reg1, mem[x]
ADD reg2, reg3
CHECK
ADD reg2, reg3
The advanced load adds a check operation to the program flow. The check operation takes time to complete, which can delay the time at which the ADD instruction (and any other instructions that depend on the load) is retired. Typically, operations that need to be executed fast are implemented in hardware, since operations implemented on specially designed hardware tend to be faster than those implemented by software on a general purpose processor. In the above example, a fast check operation is necessary to avoid offsetting any latency advantage provided by the advanced load. However, hardware solutions place additional burdens on the already limited die area available on modem processors.
The present invention addresses these and other problems related to processing advanced load operations.
SUMMARY OF THE INVENTION
The present invention provides a mechanism for implementing advanced load operations without the need for significant additional hardware support.
In accordance with the present invention, an advanced load is implemented by processing a first load operation to a memory address. The first load operation is subsequently checked by comparing data in a register targeted by the first load operation with data currently at the memory address.
For one embodiment of the invention, a second load operation targets data currently at the memory address, and the data returned by the second load is compared with the data provided by the first load. The load and check operations may be scheduled by a compiler, or they may be micro-operations that are scheduled on the fly by a processor.
REFERENCES:
patent: 4574349 (1986-03-01), Rechtschaffen
patent: 4958378 (1990-09-01), Bell
patent: 5467473 (1995-11-01), Kahle et al.
patent: 5542075 (1996-07-01), Ebcioglu et al.
patent: 5565857 (1996-10-01), Lee
patent: 5694577 (1997-12-01), Kiyohara et al.
patent: 5838943 (1998-11-01), Ramagopal et al.
patent: 5850513 (1998-12-01), Whittaker et al.
patent: 5872990 (1999-02-01), Luick et al.
patent: 5903749 (1999-05-01), Kenner et al.
patent: 6088790 (2000-07-01), Grochowski
patent: 6192464 (2001-02-01), Mittal
patent: 6202204 (2001-03-01), Wu et al.
patent: 6222552 (2001-04-01), Haas et al.
patent: 6223280 (2001-04-01), Horton et al.
patent: 6240490 (2001-05-01), Lyles, Jr. et al.
Blakely , Sokoloff, Taylor & Zafman LLP
Intel Corporation
Patel Gautam R.
LandOfFree
Method for comparing returned first load data at memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for comparing returned first load data at memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for comparing returned first load data at memory... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3188268