Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2001-01-25
2003-11-11
Padmanabhan, Mano (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S122000, C711S144000, C711S145000
Reexamination Certificate
active
06647466
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to computer systems and, more specifically, to multiple level cache memories used in computer systems.
2. Background Information
Most computer systems include one or more central processing units (CPUs), one or more main memories, and one or more input/output (I/O) subsystems all of which may be interconnected by a system bus. The CPUs typically fetch instructions and data from the main memories and execute those instructions. Although the processing speeds of CPUs has increased dramatically over the years, the speed of memory systems has not increased at the same rate. As a result, CPUs are often forced to wait, sometimes for significant periods of time, to receive instructions and/or data from the memories. Such delays can significantly degrade the computer system's performance. To reduce such delays, cache memories, which are often simply referred to as caches, were developed.
A cache is a small, fast memory module that is placed in close proximity to the CPU. Many caches are static random access memories (SRAMs), which are faster, but more expensive, than dynamic random access memories (DRAMs), which are often used for main memory. The cache is used to store frequently accessed instructions and/or data. That is, the cache contains an image or subset of the information from main memory. Instructions and/or data can be quickly accessed by a CPU from its respective cache, thereby reducing delays. When the CPU issues a request for instructions and/or data, a search is first made to see whether the requested instructions and/or data are resident in the CPU's cache. If they are, a cache “hit” is said to occur and the desired instructions and/or data are retrieved from the cache and provided to the CPU for processing. If the requested instructions and/or data are not found in the cache, a cache “miss” is said to occur. In this case, a request must be sent to main memory for the desired instructions and/or data. While this request is being processed, the CPU remains in a wait state. If the wait is expected to be especially long, some CPUs may begin executing a new process or thread.
The desired instructions and/or data are read out of main memory and sent to the CPU. When the instructions and/or data are received at the CPU, they are typically placed in the cache. To reduce the number of cache misses, computer architects have devised numerous schemes to try and anticipate what instructions and/or data a CPU is likely to request in the future. These “anticipated” instructions and/or data are prefetched and placed in the cache.
In most computer systems, the instructions and data at main memory are organized into units typically referred to as “blocks” or “cache lines” each of which is separately addressable. Instructions and data are typically moved about the computer system in terms of one or more blocks or cache lines. Furthermore, cache lines can be mapped into a cache in a number of different ways. The three most widely used mapping procedures are known as “fully associative mapping”, “direct mapping” and “set-associative mapping”. With fully associative mapping, a cache line can be placed in any location or entry of the cache. With direct mapping, every cache line maps to a single, specific cache entry. With set-associative mapping, a particular cache line can be placed in a restrictive, predetermined set of entries within the cache. Thus, a cache line is first mapped into a particular set, and is then placed in any available or free cache entry within that set.
Due to the significant advantages that they provide, nearly all, if not all, computer systems include caches. In fact, many computer systems have multiple caches organized into levels in order to improve their efficiency even further. A multilevel cache system may include, for example, two cache memory modules disposed between the CPU and the main memory. A first level (L
1
) cache may be directly coupled to the CPU. The L
1
cache is typically very small, but very fast. A second level (L
2
) cache may be disposed tbetween the L
1
cache and the main memory. The L
2
cache is typically larger but somewhat slower than the L
1
cache, although it is still faster than accessing main memory.
In most multilevel cache systems, the contents of each cache level are also present at each lower level. That is, all of the instructions and/or data at the L
1
cache are also present in the L
2
cache. Each higher level cache (e.g., L
2
), however, typically contains addition information than the preceding level (e.g., L
1
). This arrangement, which is known as the “subset rule”, is used in order to speed up searches of the cache levels typically initiated by other CPUs in multiprocessor computer systems. More specifically, if a first CPU issues a request to main memory for a particular cache line, a search may also be made of the caches associated with the other CPUs in the computer system. If the cache hierarchy associated with each CPU follows the subset rule, a search need only be made of the highest cache level for each CPU. If the cache line is not present at the highest level cache, then it cannot be present in any of the lower levels. The computer system can thus locate particular cache lines relatively quickly. If each cache level for each CPU had to searched, the performance of the computer system could be severely degraded.
Before issuing a request to main memory, a CPU first searches all of its cache levels for the desired cache line. In particular, each cache level is searched until either a hit is obtained or a miss occurs at the highest level. That is, a search is first made of the highest level cache (e.g., L
1
). If there is no hit at the L
1
cache, a search is made of the next level cache (e.g., L
2
) and so on. If there is no cache hit following the search of the highest level cache, then a request for the desired cache line is issued to main memory. As indicated above, the highest cache level of the other CPUs may also be searched for the requested cache line. If the cache line is not present at any of the caches, it is fetched from main memory and sent to the requesting CPU.
Because more than one CPU may request a copy of the same cache line from main memory, cache coherency protocols have been developed to ensure that no CPU relies on a cache line that has become stale, typically due to a change or update performed to the cache line by some other CPU. Many cache coherency protocols associate a state with each cache line. A given cache line, for example, may be in a shared state in which copies of the cache line may be present in the cache hierarchies associated with multiple CPUs. When a cache line is in the shared state, a CPU may read from, but not write to, the respective cache line. To support write operations, a cache line may be in an exclusive state. In this case, the cache line is owned by a single CPU which may write to the cache line.
When a CPU wishes to obtain exclusive ownership over a cache line that is currently in the shared state (i.e., copies of the cache line are present in the cache hierarchies of other CPUs), invalidate requests are typically issued to those other CPUs. When an invalidate request is received by a given CPU, each level of the respective cache hierarchy is searched for the cache line specified by the invalidate request. If the specified cache line is found, it is transitioned to an invalid state. Many caches assign or associate a valid bit with each cache line stored in the cache. If the bit is asserted, then the cache line is considered to be valid and may be accessed and processed by the CPU. When a is cache line is initially received from main memory, the valid bit is typically asserted and the cache line is stored in the cache. When an invalidate request is received, the valid bit of the respective cache line is deasserted, thereby reflecting that the cache line is no longer valid.
Suppose, after invalidating a cache line within its hierarchy, a CPU issues its own request for this same
Baker Paul
Padmanabhan Mano
LandOfFree
Method and apparatus for adaptively bypassing one or more... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for adaptively bypassing one or more..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for adaptively bypassing one or more... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3121691