Directoryless L0 cache for stall reduction

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C411S122000, C411S128000, C411S168000, C411S169000

Reexamination Certificate

active

06823430

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to computer systems and, more specifically, to a cache system in a central processing unit of a computer.
2. Description of the Prior Art
Many modern computing systems use a processor having a pipelined architecture to increase instruction throughput. In theory, pipelined processors can execute one instruction per machine cycle when a well-ordered, sequential instruction stream is being executed. This is accomplished even though the instruction itself may implicate or require a number of separate microinstructions to be executed. Pipelined processors operate by breaking up the execution of an instruction into several stages that each require one machine cycle to complete. Latency is reduced in pipelined processors by initiating the processing of a second instruction before the actual execution of the first instruction is completed. In fact, multiple instructions can be in various stages of processing at any given time. Thus, the overall instruction execution latency of the system (which, in general, can be thought of as the delay between the time a sequence of instructions is initiated, and the time it is finished executing) can be significantly reduced.
In some modern computer systems, integer and commercial instruction streams have many loads whose targets have an immediate usage in the next instruction. With higher frequency microprocessors, pipeline depth has increased such that a level one data cache (L1 Dcache) load access can be many cycles, during which time any following dependent instructions must stall. An additional small data cache, called an L0 or level zero cache, has been proposed to mitigate the longer L1 Dcache access where the L0 is typically a one cycle total lead access time cache of small size, 1-8 KB. However, in high-frequency pipelined designs, L0 caches have been fraught with problems, including: high miss rates (30-50%) from their small size and direct map nature (one-way associative), significant additional complexity of another full data cache level, high power usage due to their constant utilization, and long line fill times creating line reference trailing edge stalls. The combination of these factors, combined with extremely high-frequency deep pipelines, has led to the general abandonment of L0 caches.
Therefore, there is a need for a small cache with a short lead access time that has a low miss rate, low power usage and a short fill line time.
SUMMARY OF THE INVENTION
The disadvantages of the prior art are overcome by the present invention which, in one aspect, is a memory system for a computational circuit having a pipeline including at least one functional unit. An address generator generates a memory address. A coherent cache memory is responsive to the address generator and is addressed by the memory address. A cache directory is associated with the cache memory. The cache memory is capable of generating a cache memory output. A non-coherent directory-less associative memory is responsive to the address generator and is addressable by the memory address. The associative memory receives input data from the cache memory. The associative memory is capable of generating an associative memory output that is delivered to the functional unit. A comparison circuit compares the associative memory output to the cache memory output and asserts a miscompare signal when the associative memory output is not equal to the cache memory output.
In another aspect, the invention is a method of providing data to a functional unit of a pipeline. A coherent cache memory is addressed with a memory address, thereby generating a cache memory output. A non-coherent directory-less associative memory is addressed with the memory address, thereby generating an associative memory output. The associative memory output is delivered to the functional unit. The cache memory output is compared to the associative memory output. When the cache memory output is not identical to the associative memory output, the functional unit is disabled.
These and other aspects of the invention will become apparent from the following description of the preferred embodiments taken in conjunction with the following drawings. As would be obvious to one skilled in the art, many variations and modifications of the invention may be effected without departing from the spirit and scope of the novel concepts of the disclosure.


REFERENCES:
patent: 4774654 (1988-09-01), Pomerene et al.
patent: 5649154 (1997-07-01), Kumar et al.
patent: 5826052 (1998-10-01), Stiles et al.
patent: 6078992 (2000-06-01), Hum
patent: 6081872 (2000-06-01), Matick et al.
patent: 6138208 (2000-10-01), Dhong et al.
patent: 6282614 (2001-08-01), Musoll
patent: 6321297 (2001-11-01), Shamanna et al.
patent: 6397296 (2002-05-01), Werner
patent: 6496903 (2002-12-01), Terunuma et al.
patent: 2002/0046325 (2002-04-01), Cai et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Directoryless L0 cache for stall reduction does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Directoryless L0 cache for stall reduction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Directoryless L0 cache for stall reduction will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3337320

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.