Electrical computers and digital processing systems: memory – Address formation – Address mapping
Reexamination Certificate
2001-07-18
2003-06-17
Elmore, Reba I. (Department: 2187)
Electrical computers and digital processing systems: memory
Address formation
Address mapping
C711S204000, C711S213000
Reexamination Certificate
active
06581151
ABSTRACT:
FIELD OF THE INVENTION
This invention relates in general to the field of store forwarding, and more particularly to store forwarding in microprocessors supporting paged memory.
BACKGROUND OF THE INVENTION
It is common for modern microprocessors to operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, “an implementation technique whereby multiple instructions are overlapped in execution.”
Computer Architecture: A Quantitative Approach,
2
nd
edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. The authors go on to provide the following excellent illustration of pipelining:
“A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe—instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.”
Thus, as instructions are fetched, they are introduced into one end of the pipeline. They proceed through pipeline stages within a microprocessor until they complete execution. However, as instructions proceed through the pipeline stages, an instruction executing in an early, or upper, pipeline stage may require a result of another instruction executing ahead of it in a later, or lower, pipeline stage.
One situation in which one instruction executing in a higher pipeline stage requires a result generated by a preceding instruction executing in a lower pipeline stage is referred to as a storehit condition. A storehit condition exists when a load instruction requests store data specified by a store instruction executing ahead of the load instruction in the pipeline. That is, the load instruction specifies a load address for load data, wherein the load address matches a store address for the store data specified by the store instruction issued previous to the load instruction, and the store data is still in the microprocessor pipeline, i.e., has not yet been updated in the microprocessor data cache or written to system memory.
It has been observed that storehit conditions occur relatively frequently in modern microprocessors, particularly in x86 microprocessors. This phenomenon is largely attributed to the fact that modern compilers recognize the relatively small number of registers available in the x86 register file and the fact that virtually every contemporary x86 processor has a large built-in data cache that is essentially accessible at the same speed as the register file. Therefore, when the compilers run out of registers in the register file, they use the data cache as a huge register file. In particular, compilers have been observed to generate code that causes storehit conditions in the following situations.
First, a loop counter variable is stored in a memory location. Second, a memory location is used as a temporary location for a sequence of arithmetic operations. Third, a stack location is accessed within a very short instruction sequence due to the calling of a very short subroutine. That is, a return address is pushed, followed by a jump to the subroutine, followed by a very small number of instructions of the subroutine, followed by a pop of the return address generating a storehit on the location of the return address.
In a storehit condition, the load instruction must be provided with coherent data, i.e., the newest data associated with the load address. Thus, the microprocessor cannot supply the data from its data cache or go to system memory to get the data since the newest data is within the pipeline and not in the data cache or system memory. One solution is for the microprocessor to stall and wait for the storehit data to be updated in the data cache or system memory, and then provide the data to the load instruction from the data cache or system memory. However, this solution has obvious performance disadvantages. A higher performance solution is to determine the newest data matching the load address, and to forward the newest data from the stage in which the store is pending to the load instruction stage.
Forwarding storehit data is complicated by the fact that many microprocessors use a paged memory scheme. In a paged memory scheme, virtual addresses of load and store instructions must be translated into physical addresses in order to access memory properly. In order to detect a storehit and to forward the proper data, the physical address of the load must be compared with the physical addresses of the stores pending in the processor. Comparing virtual addresses will not suffice since the load and stores could have different virtual addresses and yet still be referring to the same physical address in a paged memory system.
Paging microprocessors typically employ a translation-lookaside buffer (TLB) to cache physical addresses previously translated from virtual addresses. The virtual address is provided to the TLB, which looks up the virtual address and provides the translated physical address of the virtual address if the physical address is cached in the TLB. The TLB improves data access time by avoiding having to repeat the lengthy task of translating a virtual address to its physical address for recently accessed data.
In order to detect a storehit condition, the physical address of the load instruction is compared with the physical address of the pending stores in the pipeline. If a storehit occurs, the newest storehit data is forwarded to the load instruction. Presently, the TLB lookup, the physical address comparison and the data forwarding are serialized. The serialized time of these operations may be the critical path for processor cycle timing purposes. Therefore, what is needed is a method for reducing the serialized time in order to reduce processor cycle time and thereby improve processor performance.
SUMMARY
The present invention provides a method and apparatus in a paging microprocessor for reducing store forwarding time by speculatively forwarding based on a physical page index comparison of a load and pending stores rather than waiting to compare the full physical addresses. Accordingly, in attainment of the aforementioned object, it is a feature of the present invention to provide a speculative store forwarding apparatus in a microprocessor pipeline. The pipeline includes first and second stages. The first stage receives load data specified by a load virtual address. The second stage stores store data pending in the pipeline for writing to a store physical address. The load virtual address includes a load virtual page number and a load physical page index. The store physical address includes a store physical page address and a store physical page index. The apparatus includes an index comparator that compares the load physical page index with the store physical page index. The apparatus also includes forwarding logic, coupled to the index comparator, which forwards the store data from the second stage to the first stage if the index comparator indicates the load physical page index matches the store physical page index.
In another aspect, it is a feature of the present invention to provide a microprocessor supporting paged virtual memory. The microprocessor includes an index match indicator that indicates whether a physical page index of load data specified by a load instruction matches a physical page index of store data pending in the microprocessor. The microprocessor also includes forwarding logic, coupled to the index match indicator, which forwards the store data to the load instruction if the index match indicator indicates that the l
Henry G. Glenn
Hooker Rodney E.
Davis E. Alan
Elmore Reba I.
Hoffman James W.
IP-First LLC
LandOfFree
Apparatus and method for speculatively forwarding storehit... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for speculatively forwarding storehit..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for speculatively forwarding storehit... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3158877