Electrical computers and digital processing systems: memory – Addressing combined with specific memory configuration or... – Addressing cache memories
Reexamination Certificate
1997-12-22
2001-06-12
Lane, Jack A. (Department: 2185)
Electrical computers and digital processing systems: memory
Addressing combined with specific memory configuration or...
Addressing cache memories
C711S118000
Reexamination Certificate
active
06247094
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to the field of data processing systems, and, more particularly, to cache memory used in data processing systems. Specifically, the present invention relates to a cache memory architecture with way prediction.
2. Description of the Related Art
The demand for quicker and more powerful personal computers has led to many technological advances in the computer industry, including the development of faster memories. Historically, the performance of a personal computer has been directly linked to the efficiency by which data can be accessed from memory, often referred to as the memory access time. Generally, the performance of a central processing unit (CPU or microprocessor), which functions at a high speed, has been hindered by slow memory access times. Therefore, to expedite the access to main memory data, cache memories have been developed for storing frequently used information.
A cache is a relatively small high-speed memory that is used to hold the contents of the most recently utilized blocks of main storage. A cache bridges the gap between fast processor cycle time and slow memory access time. Using this very fast memory, the microprocessor can reduce the number of wait states that are interposed during memory accesses. When the processor issues the load instructions to the cache, the cache checks its contents to determine if the data is present. If the data is already present in the cache (termed a “hit”), the data is forwarded to the CPU with practically no wait. If, however, the data is not present (termed a “miss”), the cache must retrieve the data from a slower, secondary memory source, which may be the main memory or another cache, in a multi-level cache memory system. In addition, the retrieved information is also copied (i.e. stored) into the cache memory so that it is readily available to the microprocessor for future use.
Most cache memories have a similar physical structure. Caches generally have two major subsystems, a tag subsystem (also referred to as a cache tag array) and memory subsystem (also known as cache data array). A tag subsystem holds the addresses and determines where there is a match for a requested datum, and a memory subsystem stores and delivers the data upon request. Thus, typically, each tag entry is associated with a data array entry, where each tag entry stores index information relating to each data array entry. Some data processing systems have several cache memories (i.e. a multi-level cache system), in which case, each data array will have a corresponding tag array to store addresses.
Utilizing a multi-level cache memory system can generally improve the proficiency of a central processing unit. In a multi-level cache infrastructure, a series of caches can be linked together, where each cache is accessed serially by the microprocessor. For example, in a three-level cache system, the microprocessor will first access the L
0
cache for data, and in case of a miss, it will access cache L
1
. If L
1
does not contain the data, it will access the L
2
cache before accessing the main memory. Since caches are typically smaller and faster than the main memory, the general trend is to design modern day computers using a multi-level cache system.
To further improve the performance of a central processing unit, computer architects developed the concept of pipelines for parallel processing. The first step in achieving parallel processing is to decompose the process at hand into stages. Typically, a computer executes all the stages of the process serially. This means that the execution of all the stages of the process must be complete before the next process is begun. A computer often executes the same staged process many times in succession. Rather than simply executing each staged process serially, the microprocessor can speed up the processing through pipelining, in which the stages of the repeating process are overlapped.
The concept of pipelining has now extended to memory caches as well. Pipelines can enhance the throughput of a cache memory system, where the throughput is defined as the number of cache memory access operations that can be performed in any one time period. Because caches are typically accessed serially, and can be decomposed into stages, it is possible to use pipelines to speed up the accessing process. In fact, modem data processing systems achieve even greater efficiency by applying the art of pipelining to multi-level cache memory systems.
An example of a two-level pipelined cache system is illustrated in
FIG. 1
, which stylistically depicts the L
1
and L
2
cache stages
5
-
30
of the Intel Pentium® Pro System Architecture. It takes three stages
5
,
10
, and
15
to complete an access of the L
1
cache (not shown), and three additional stages
20
,
25
, and
30
to complete an access of the L
2
cache (not shown). Each stage takes one cycle to complete. In the first stage
5
, when a request for a load or store is issued, the address is provided to the L
1
cache (not shown). During the second and the third stages
10
,
15
, the lookup takes place and, in case of a hit, the data transfer occurs. If the access is a miss in the L
1
cache (not shown), then the request enters the fourth stage
20
, where the address is submitted to the L
2
cache (not shown). During the fifth stage
25
, the lookup takes place and, if a hit, the data is transferred during the sixth stage
30
. In summary, a load request that hits the L
1
cache (not shown) completes in three clocks, while one that misses the L
1
cache (not shown) but hits the L
2
cache (not shown) completes in six clocks. If the load request misses the L
2
cache (not shown), then the request is forwarded to the main memory (not shown).
FIG. 2
is a timing diagram illustrating an example of the Intel Pentium® Pro Architecture's two-stage pipelined cache being accessed by the microprocessor (not shown). As illustrated in the figure, the microprocessor (not shown) makes four different cache accesses (i.e. requests)
32
-
35
. The first access
32
results in an L
1
cache hit and, as a result, the request is completed within three stages. The second access
33
, however, misses in the L
1
cache (not shown), and the request is then forwarded to the L
2
cache (not shown). Thus, it takes six stages to retrieve data from the L
2
cache (not shown). Because the L
1
and L
2
caches (not shown) are pipelined, the first and the second accesses
32
and
33
complete in a total of seven clock cycles. However, in a non-pipelined cache system (not shown), this process would require nine clock cycles, because the L
1
access would have to complete before the L
2
access initiates. That is, the earliest the second access can initiate is during the fourth clock cycle, and not the during the second clock cycle, as it does in a pipelined cache system. The third and fourth accesses
34
and
35
are shown only to further illustrate how pipelined caches can improve the throughput of cache memories by processing multiple requests simultaneously.
As the number of levels in a multi-level pipelined cache memory system have increased, so have the number of pipeline stages required to support the added levels. Generally, the number of pipeline stages required to support a cache memory is proportional to the number of clock cycles required to access that memory. For a given frequency, a pipeline with more stages requires more circuitry, which not only adds to the expense of implementing pipelines, but also hinders performance and consumes additional power. It is therefore desirable to have a cache memory architecture that reduces the required number of pipeline stages, yet achieves equal or better performance.
In a multi-level cache system, it is not uncommon to find level-one, or even level-two caches on the same silicon die as the microprocessor core. To enhance the system performance, it is often desirable to fit the maximum possible cache memories on the CPU core itself. When the cache is on the
Baweja Gunjeet D.
Chan Tim W.
Chang Cheng-Feng
Kumar Harsh
Blakely , Sokoloff, Taylor & Zafman LLP
Intel Corporation
Lane Jack A.
LandOfFree
Cache memory architecture with on-chip tag array and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache memory architecture with on-chip tag array and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache memory architecture with on-chip tag array and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2487506