Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-08-20
2001-07-31
Bragdon, Reginald G. (Department: 2188)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S158000, C711S143000, C711S146000, C711S122000
Reexamination Certificate
active
06269425
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to management of a memory cache in a manner which improves cache performance.
BACKGROUND OF THE INVENTION
In a data processing system, instructions and associated data are transferred from memory to one or more processors for processing, and then resulting data generated by the processor is returned to memory for storage. Thus, typical processing operations involve frequent and repetitive reading and writing from memory. As a result, memory access delays are often a primary limitation in the performance of a data processing system. Preferably, therefore, memory access speed should be maximized to maximize performance. However, often cost and other constraints require that the main memory be comprised of relatively long access time circuitry. To overcome the resulting performance drawbacks, memory caches are typically used.
A memory cache typically includes a relatively small, but high speed, bank of memory, which can be more rapidly accessed by the processor(s) than the main memory. Memory locations in the main memory are duplicated in the cache. When a particular memory location being accessed by the processor is duplicated in the cache—event which is known as a cache “hit”—the processor may rapidly access the cache instead of waiting for access to main memory. The cache is managed with the goal of maximizing the fraction of accesses which are hits in the cache.
Caches are typically organized into “lines”, which are relatively long sequences of memory locations found in main memory. Typically, when a memory location accessed by a processor is not duplicated in the cache—an event which is known as a cache “miss”—an entire line containing the missed memory location, and neighboring memory locations, is brought into the cache as part of retrieving the missed location from other caches or main memory—an event which is known as a “linefill” into the cache.
Typically, each cache line is associated with multiple groups of locations in the main memory. Each cache line stores duplicates of associated groups of memory locations, as well an indication of which groups of memory locations are currently stored in that line. Thus, when a processor requests access to a particular memory location, the cache line corresponding to that memory location is accessed to determine whether that cache line is storing the group of memory locations which includes the requested location. If so, the requested memory location is accessed in the cache. If not, a group of memory locations including the requested location is linefilled into the cache.
Typically, an n-way associative cache stores n of the several groups of locations corresponding to a cache line in the cache at one time. When a group of memory locations is linefilled into the cache, memory contents in the same cache line may need to be replaced. If the contents of the replaced cache line have been modified, then the line has to be stored back into the corresponding group of locations in the main memory—an event which is known as a “castback” or “writeback” from the cache.
In high performance data processing systems, often there are two or more caches, organized so that a processor attempts to access a memory location by first attempting to locate a duplicate of that location in a “level
1
” or L
1
cache. If there is a miss in the L
1
cache, then an attempt is made to locate a duplicate of the desired memory location in a “level
2
” or L
2
cache. If there is a miss in the L
2
cache, each lower level cache is sequentially checked in the same manner. If there is a hit in one of the caches, then the desired memory locations are obtained from that cache, and typically, the accessed memory locations are duplicated, along with neighboring locations completing a cache line, into the appropriate line of at least the L
1
cache—although in some cases an access may be “cache-inhibited”, in which case the data is not stored in the L
1
cache after retrieval. If there are misses in all of the caches, the missed location, along with neighboring locations completing a cache line, is retrieved from main memory, and filled into one or more of the caches if the access is not cache-inhibited. Similarly, if a line is cast back from a cache, the line may be written to a higher level cache, main memory, or both.
Typically, lines of instructions and data are transferred from caches and processors to other caches and processors using buffers. For instance, in one architecture two buffers are respectively connected to a level
1
cache and a level
2
cache. These buffers are also connected to main memory, a host processor, and possibly other processors via a system bus. The buffers allow for a smooth transition of data or instructions between components having different transfer rates. Each line in a conventional cache buffer strictly handles either fill commands or write back commands, and includes memory space which can store a finite number of cache lines, e.g., four. Each cache line in a buffer is, therefore, designated as a fill cache line or a write back cache line. In a multi-way associative cache, cache buffer lines may be used for fills or writebacks, and are dynamically configured for the appropriate purpose.
In addition to the use of caches to improve memory access performance, other well known techniques have been used to improve the performance of data processing systems. One technique is to divide a processing task into independently executable sequences of instructions called threads. Using this technique in a single-processor system, when the processor, for any number of reasons, cannot continue processing or execution of a thread, the processor switches to another thread and continues processing of that thread, rather than stalling. For example, when a cache miss stalls processing of one thread, the processor may switch to other threads which are able to continue processing. By the time the processor returns to the stalled thread, the missed location may have been linefilled into the cache, and that thread can resume with minimal additional delay. Furthermore, the processor may switch threads on a timed basis even where no threads are stalled. Alternatively, in a multi-processor system, each processor may handle one or more threads executing in parallel, so that when one thread or one processor is stalled, the other threads or processors may continue. For the purposes of this application, the terms “thread” and “multithreading” will be used to refer generally to any processing system, whether comprising a single processor or multiple processors, which executes multiple threads.
The term “multithreading”, when used in the context of software, is used to refer to a particular organization scheme which can be used in writing computer programs. In this software context, therefore, “multithreading” does not relate to the manner in which computer programs are physically executed. Thus, software “multithreading” is different from the kind of “multithreading” discussed in this application. The kind of multithreading to which this application relates, may be referred to as “hardware multithreading”, i.e., processor configurations permitting a single processor to switch between multiple threads of instructions upon various conditions, or permitting multiple processors to process multiple threads of instructions. Thus, in this application, the terms “thread” and “multithreading” will be understood to refer to “hardware multithreading”, and not methods for software organization.
While the technique of multithreading, and the use of memory caches, both enhance the performance of a data processing system, combining these techniques raises substantial complexities. In particular, as noted above, when a memory access misses in the L
1
cache, the required memory contents must be accessed from a higher level cache or main memory. When the memory contents are located, they must be delivered to the requesting processor, and also filled into one of the caches. While this process appears straightforward, in a mu
Freerksen Donald Lee
Mounes-Toussi Farnaz
Bragdon Reginald G.
International Business Machines - Corporation
Wood Herron & Evans
LandOfFree
Accessing data from a multiple entry fully associative cache... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Accessing data from a multiple entry fully associative cache..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Accessing data from a multiple entry fully associative cache... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2473920