Method and apparatus for resolving additional load misses...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S125000, C711S126000, C711S140000, C711S169000

Reexamination Certificate

active

06549985

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates in general to the field of pipelined microprocessors, and more particularly to microprocessor data cache operations.
2. Description of the Related Art
Modern microprocessors operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, “an implementation technique whereby multiple instructions are overlapped in execution.”
Computer Architecture: A Quantitative Approach
, 2
nd
edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. The authors go on to provide the following excellent illustration of pipelining:
A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe—instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.
An example of a pipeline stage, typically at the top of the pipeline, is one that fetches instructions from memory for the pipeline to execute. Another example is a stage that calculates addresses of data operands to be loaded from or stored to memory as specified by the instruction in the stage. Another example is a stage that performs arithmetic operations, such as adds or multiplies, on data operands associated with the instruction in the stage. Each of the stages is separated by a pipeline register that saves the output of the pipeline stage above the register at the end of a clock cycle and provides that output to the pipeline stage below the register at the beginning of the next clock cycle.
Typically, each stage performs its function during one processor clock cycle. Thus, every clock cycle each instruction in the pipeline progresses downward one stage along the pipeline. However, certain events or conditions prevent an instruction from executing in a given stage and prevent the instruction from progressing to the next stage in the pipeline on the next clock cycle. These conditions are referred to as “stall conditions” because the pipeline must be “stalled” until the condition is resolved. That is, all instructions above the stalled instruction in the pipeline are held in their current stage by the pipeline registers rather than being allowed to progress to the next stage. Instructions below the stalled instruction stage may continue down the pipeline. There are three main causes of stalls: resource conflicts, data hazards and cache misses.
Resource conflicts occur when the hardware components in the microprocessor cannot service a given combination of instructions in simultaneous overlapped execution within the pipeline. For example, a processor may support an arithmetic instruction, such as a floating point or MMX multiply instruction. The hardware may include a multiplier circuit that requires multiple processor clock cycles to perform the multiply and the multiplier is not itself pipelined, i.e., it cannot receive a second multiply instruction until it has completed the current multiply instruction. In this case, the processor must stall the pipeline at the multiplier stage.
Data hazards, or data dependencies, are another main cause of pipeline stalls. Data hazards occur when an instruction depends on the results of an instruction ahead of it in the pipeline, and therefore cannot be executed until the first instruction executes. One class of data hazards occurs when instructions access input/output (I/O) devices.
I/O devices typically include status and control registers that are read and written by the microprocessor. Some microprocessors, such as x86 processors, have dedicated instructions for accessing the registers of I/O devices, such as the x86 “in” and “out” instructions. These instructions address a separate address space of the processor bus, namely the I/O space. The other way I/O devices are accessed is by mapping them into the memory address space of the processor. Such an I/O device is referred to as a memory-mapped I/O device and the region in which the I/O device is mapped is referred to as a memory-mapped I/O region. Typically, memory mapped I/O regions are specified via registers within the microprocessor.
An example of an I/O related data hazard occurs when a first instruction writes a value to an I/O register and the next instruction reads from an I/O register on the same device, such as a store to a memory-mapped I/O region followed by a load from the same memory-mapped I/O region. Due to the nature of I/O devices, in order to insure proper operation of the I/O device, the two instructions must be guaranteed to execute in order. That is, the read cannot be executed until the write has completed.
Cache misses are a third common cause of pipeline stalls. Program execution speed often is affected as much by memory access time as by instruction execution time. This is readily observable from the fact that a typical system memory access might take 40 processor clock cycles, whereas a typical average execution time per instruction in a well-designed pipelined processor is between 1 and 2 processor clock cycles.
Load and store instructions are used to access memory. Load instructions read data from system memory and store instructions write data to system memory. When a memory access instruction reaches a stage in a processor pipeline where the memory access is performed, the pipeline must stall waiting for the memory access to complete. That is, during the typical 40 clock cycles of the memory access, the memory access instruction remains in its current stage until the specified data is written or read. When a stall occurs, all of the other instructions in the pipeline behind the stalled instruction also wait for the stalled memory access instruction to resolve and move on down the pipeline.
Processor designers attempt to alleviate the memory access time problem by employing cache memories within the processor. Data caches, which commonly require only one or two clock cycles per memory access, significantly reduce the negative effects of stalls caused by load and store instructions introduced by the large system memory access times. However, when a cache miss occurs, a pipeline stall must ensue.
Some microprocessor designers have attempted to improve on the pipelined approach by “widening” the processor, i.e., by adding more pipelines within the processor in order to execute multiple instructions in parallel and to execute those instructions out of program order where advantageous and possible. These processors are commonly referred to as “superscalar” or “multiple-issue” processors since they issue multiple instructions at a time into multiple pipelines for parallel execution. Another term associated with the techniques employed by multiple-pipeline processors is instruction level parallelism (ILP).
Typically, processor architectures require the processor to retire instructions in-order. That is, any program-visible processor state changes must be made in the order of the program instruction sequence. However, multiple-issue processors commonly execute instructions out of order by employing reorder buffers. The processor fetches a stream of instructions of a program from memory and places the instructions into the top of the reorder buffer. The processor searches the reorder buffer looking for dependencies between the various instructions, such as data hazards or resource conflicts discussed above.
Instructions that do not have dependencies may be reordered within the reorder buffer for out of order execution. The instructions are then removed from the b

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for resolving additional load misses... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for resolving additional load misses..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for resolving additional load misses... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3055317

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.