Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-11-19
2002-04-09
Yoo, Do Hyun (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S119000, C711S129000, C711S128000, C711S133000, C711S153000, C711S173000, C711S202000
Reexamination Certificate
active
06370622
ABSTRACT:
BACKGROUND OF THE INVENTION
A cache is a hardware-managed buffer designed to reduce memory access latency by copying data that is likely to be accessed in the near future into faster cache memory. In the presence of an associated cache, a device that needs to access memory, such as a processor, first looks into the cache for a copy of data from the desired memory location. If a copy is found, the device uses it, thus avoiding the longer latency of accessing memory itself. Caches can be used for both read and write memory accesses, also known as load and store operations respectively. Caches are used for both data and instructions, and a system may have multiple caches.
A cache is characterized by the following aspects of its operation:
(i) when data is copied into the cache,
(ii) how the copies are organized and stored when they are in the cache,
(iii) when a copy is removed from the cache, and the replacement policy, i.e., the rules for making room for new data being copied into the cache when the cache is already full,
(iv) virtual versus physical addressing.
A cache is also defined by a number of numerical parameters that are introduced as the behavior of a conventional cache is explained.
When a processor wants to access a memory location and looks for it in its associated cache, it may find one of three situations. One possibility is that the cache does not have a copy of the memory location, a situation known as a cache-miss. Another possibility is that the cache has a copy of the data and has the correct permission for the desired type of access. This is known as a cache-hit. In the third situation, which only arises in more complex cache designs, a copy may be present in the cache, but the cache does not have the permission to grant the desired operation, typically a store operation. In this case, the cache needs to take further actions before the desired operation can complete. This third situation is sometimes known as an upgrade.
A standard cache begins with no copies in its fast memory, and brings in data whenever a cache-miss occurs. Typically, a contiguous block of memory containing the accessed location but larger than the access size is brought into the cache. For instance, the access may be for 4 bytes of data, but a 64 byte block of memory content is copied into the cache on a cache miss. Such a block of memory is referred to as a cache-line, and once it is copied into the cache, this memory block is said to be cached.
The use of a cache produces performance gain when the same data is used repeatedly, or neighboring data brought in during a cache-miss is used soon after. The former takes advantage of “temporal locality” in the memory access pattern, while the latter takes advantage of “spatial locality”. They result in cache-hits and use of the faster cache copies instead of accessing the slower memory.
A cache holds a fixed number of cache-lines entries, each containing the cached data, enough information to identify the memory address that this data comes from and some cache management state information. A standard cache is typically organized in one of three ways: (i) fully-associative, (ii) direct-mapped, or (iii) set-associative. These organizations differ in the group of the cache-line entries that can be used to store a cache-line with a particular address.
In a fully-associative cache, any cache-line entry can be used to store a copy of any memory address block. Under this strategy, checking whether a particular memory location has been copied into the cache requires comparing the address of interest against the address of every cache-line entry.
In a direct-mapped cache, a memory block with a particular address can be stored in only one particular cache-line entry. This simplifies lookup since only one location needs to be checked. Note that multiple memory blocks at distinct addresses can map to the same entry because a smaller cache has to serve a larger memory. Typically, the cache entry that is used is determined by the lower order bits of the memory block's address. In this way, contiguous cache-line granularity memory locations map to different cache entries.
The set-associative organization is intermediate between direct-mapped and fully-associative, and is actually a family of designs parameterized by the number of ways. The easiest way to think of set-associative organization is to consider an N-way set associative cache as having N direct-mapped sub-caches. Each of the N direct-mapped portion is referred to as a “way”. Checking for a particular memory block requires checking the N possible cache-line entries that may hold a copy of it. These N entries are said to be in the same set, giving rise to the notion of the number of sets in a set-associative cache.
The fully-associative and direct-mapped caches are degenerate cases of the set-associative organization where the number of sets is one or the number of ways is one respectively.
A number of events can cause a conventional cache to remove a valid copy, or alter the permissions associated with the copy. Because a cache is smaller than the actual memory that it is speeding up, it is possible that when a cache-line is brought in there is no free space in the cache to accommodate it at that time. This can happen in any of the three cache organizations. For the fully-associative organization, this happens when the cache is completely full. In the direct-mapped cache, this happens when the only entry that can accommodate this new cache-line is already in use. In an N-way set associative organization, this occurs if there are already N entries in the set to which the new cache-line maps.
When new data is copied into a cache but all the possible locations for storing it are in use, one of the existing copies has to be evicted to make room for the new data. Under the direct-mapped case, there is exactly one possible entry for the new data, so the current occupant of that entry has to be evicted. In the fully-associative and set-associative cache organizations, any one of multiple entries can be evicted. In these cases, the method of selecting a particular entry for eviction is called the replacement policy. The most common policies are either a random algorithm, or a least-recently-used (LRU) algorithm.
A random replacement algorithm picks one of the possible eviction candidates at random. An ideal least-recently-used algorithm looks at when each eviction candidate was last accessed, and evicts the one that has not been accessed for the longest time. Studies have shown that for many programs, when a location was accessed recently, the same location and its neighbors, such those in the same cache-line block, are likely to be accessed again. There are of course exceptions to this access pattern, so LRU is not an optimal policy for all situations.
Some processors and their associated caches allow software to explicitly evict or alter the permissions associated with data that has been copied into the cache. This is a second way in which cache copies are removed.
A conventional cache used in a multi-processor system may also remove a valid copy, or alter the permissions associated with the copy in respond to memory accesses of other processors. In many computer systems, two or more processors each having a dedicated cache may share a common memory across a memory bus. More generally, the processor may contain additional levels of caches. For simplicity, the term “device” is used to refer to a processor that may include additional level of caches that uses a cache.
FIG. 1
shows a typical system comprising devices
10
, associated caches
12
, shared bus
14
, and common memory
16
.
Many caches, sometimes called in-line caches, can be accessed from two different sides. A master side that is closer to the processor, and a “snooping” side that is closer to the shared bus. The cache receives requests for specific memory locations from the master side, and attempts to satisfy them from its cache entries. If the request misses in the cache, or the cache copy does not come with sufficient permission
Ang Boon S.
Chiou Derek
Hamilton Brook Smith & Reynolds P.C.
Massachusetts Institute of Technology
Song Jasmine
Yoo Do Hyun
LandOfFree
Method and apparatus for curious and column caching does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for curious and column caching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for curious and column caching will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2914187