Memory system for permitting simultaneous processor access...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S133000, C711S143000, C711S145000, C711S150000, C711S155000, C711S156000, C711S168000, C711S210000

Reexamination Certificate

active

06339813

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to the field of cache memory in computer systems, and more specifically to an improved method and apparatus for managing the access of cache lines during cache line replacement.
2. Discussion of the Prior Art
Computer systems generally consist of one or more processors that execute program instructions stored within a memory medium. This medium is most often constructed of the lowest cost per bit, yet slowest storage technology. To increase the processor performance, a higher speed, yet smaller and more costly memory, known as a cache memory, is placed between the processor and final storage to provide temporary storage of recent/and or frequently referenced information. As the difference between processor speed and access time of the final storage increases, more levels of cache memory are provided, each level backing the previous level to form a storage hierarchy. Each level of the cache is managed to maintain the information most useful to the processor. Often more than one cache memory will be employed at the same hierarchy level, for example when an independent cache is employed for each processor.
Typically only large “mainframe” computers employ memory hierarchies greater than three levels. However, systems are now being created using commodity microprocessors that benefit greatly from a third level of cache in the memory hierarchy. This level is best suited between the processor bus and the main memory, and being shared by all processors and in some cases the I/O system too, it is called a shared cache. Each level of memory requires several times more storage than the level it backs to be performance effective. Thus, for example, the shared cache may require several tens of megabytes of memory. To remain cost effective, the shared cache is implemented using low cost Dynamic Random Access Memory (DRAM), yet at the highest performance available. This type of shared cache is typically accessed at a bandwidth that involves lengthy transfer periods, at least ten times that which is typical of other caches, to and from the main memory.
Cache memory systems in computing devices have evolved into quite varied and sophisticated structures, but always they address the tradeoff between speed and both cost and complexity, while functioning to make the most useful information available to a processor as efficiently as possible. Since a cache is smaller than the next level of memory in the hierarchy, it must be continuously updated to contain only information deemed useful to the processors.
FIG. 1
illustrates a block diagram of a conventional computer system
100
implementing a shared cache level memory. The system
100
is shown as including one or more processors
101
with level 1
102
and level 2
103
local caches forming a processor node
104
, each connected to a common shared memory controller
105
that provides access to the a shared level 3 cache
106
and associated directory
116
, and system main memory
107
representing the last level of a four level memory hierarchy. The cache control
108
is connected to the processor address bus
109
and to the data bus
110
. The processor data bus is optimized and primarily used for transporting level 2 cache data lines between a level 2 cache and the level 3
111
and/or another level 2 cache
112
. The main memory data bus
114
is optimized for, and primarily used for transporting level 3 cache data lines between the level 3 cache and the main memory
113
. The level 3 cache data bus
115
is used for transporting both level 3 and level 2 data traffic, but is optimized for the level 2 cache data traffic. The level 3 cache
106
is both large and shared, and is typically constructed of the highest performance dynamic random access memory (DRAM) to provide enough storage to contain several times the collective storage of the local caches. The amount of main memory storage is typically over a thousand times that of the shared cache, and is implemented using inexpensive and often lower performance DRAM with processor access latencies much longer that the shared cache.
The processors
101
request read or write access to information stored in the nearest caches
102
,
103
through a local independent address and data bus (not shown) within the processor node
104
. If the information is not available in those caches, then the access request is attempted on the processor's independent address and data busses
109
,
110
. The shared memory controller
105
and other processor nodes
104
′ detect and receive the request address along with other state information from the bus, and present the address to their respective cache directories. If the requested data is found within one of the neighboring processor nodes
104
′, then that node may notify the devices on the bus of the condition and forward the information to the requesting processor directly without involving the shared cache any further. Without such notification, the shared memory controller
105
L3 cache controller
108
will simultaneously address the shared cache directory
116
and present the DRAM row address cycle on the cache address bus
117
according to the DRAM protocol. In the next cycle, the directory contents are compared to the request address tag, and if equal and the cache line is valid (cache hit), then the DRAM column address cycle is driven on the cache address bus
117
the following cycle to read or write access the cache line information. The shared memory controller
105
acknowledges processor read requests with the requested data in the case of a cache hit, otherwise the request is acknowledged to indicate retry or defer to the processor, implying that a cache miss occurred and the information will not be available for several cycles.
Referring to
FIG. 2
, there is illustrated a 4-way set associative 32 MB shared cache system
200
employing 1024-byte cache lines. The temporary information stored within the cache is constantly replaced with information deemed more valuable to the processor as its demands change. Therefore the cache array
201
is partitioned into an even number of storage units called lines
202
. Each line is address mapped
203
to a group of equivalent sized ranges
208
within the main memory. A high speed directory
204
contains an entry
205
, which is directly mapped to an index address
203
to each cache line and includes: a tag address
206
to keep track of which main memory range is associated with the cache line contents, in addition to independent bit(s)
207
to store state information pertaining to the line contents. The directory entries and cache lines mapped at a given index address are grouped in an associative set of four (4) to permit the storage of combinations of different tag addresses associated with the same index address
203
. All four directory entries within a set are referenced in parallel for every processor request to determine which one of the four cache lines contains data for the request tag address.
When a processor requests information from an address within the main memory, the tag address stored within the mapped directory entries are compared by comparators
209
to the processor request address tag bits
208
, and when equal and the state bit(s)
207
indicating the information is valid, it is said that the cache has been hit. Upon determination of the hit condition, the cached information is returned to the processor. If there was no match for the tag address or the cache line was invalid, then the cache information would be retrieved from the next lower memory level. When the information becomes available, it is passed on to the requesting processor, as well as stored in the cache
201
through a process called line fill. Often the cache line
202
is larger than the request information size, resulting in more information flow into the cache beyond that required to fulfill the request, and is called trailing line fill. Of course, if the cache was already full of valid infor

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Memory system for permitting simultaneous processor access... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Memory system for permitting simultaneous processor access..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Memory system for permitting simultaneous processor access... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2832085

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.