Sectored least-recently-used cache replacement

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S129000, C711S133000, C711S118000

Reexamination Certificate

active

06823427

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to caches and, more particularly, to a least-recently-used cache replacement system.
2. Description of the Related Art
Since main system memory is typically designed for density rather than speed, microprocessor designers have added caches to their designs to reduce the microprocessor's need to directly access main memory. A cache is a small memory that is more quickly accessible than the main memory. Computer systems may have a number of different levels of caches. For example, a computer system may have a “level one” (L1) cache and a “level two” (L2) cache. These caches are typically integrated with the microprocessor. Caches are typically constructed of fast memory cells such as static random access memories (SRAMs) which have faster access times than the memories used for the main system memory (typically dynamic random access memories (DRAMs) or synchronous dynamic random access memories (SDRAMs)). The faster SRAMs are not typically used for main system memory because of their low density and high cost.
Many other types of caching are also possible. For example, the main system memory may act as a cache for the system's slower direct access storage devices (e.g., hard disk drives). Other devices, such as hard drives, may also include internal caches to improve their performance.
When a microprocessor needs data from memory, it typically first checks its L1 cache to see if the required data has been cached. If not, the L2 cache is checked. At the same time, the data may be requested from memory, in case there is a miss in the L2 cache. If the L2 cache is storing the data, it provides the data to the microprocessor (typically at much higher rate and lower latency than the main system memory is capable of), and if the data was requested from memory, that request may be cancelled. If the data is not cached in the L1 or L2 caches (referred to as a “cache miss”), the data is read from main system memory or some type of mass storage device (e.g., a hard disk drive). Relative to accessing the data from the L1 cache, accesses to memory take many more clock cycles. Similarly, if the data is not in the main system memory, accessing the data from a mass storage device takes even more cycles.
Caches typically operate on the principal of locality of reference, which states that the data most recently used (and the data in that locality) is more likely to be accessed than the rest of the data. This principle holds because computer software typically has loops and branches that cause previously executed code to be re-executed. By storing recently accessed instructions and data in a cache, system performance may be increased because the microprocessor need not wait for the instructions and data to be read from main memory.
Microprocessor and computer system architects have taken the locality of reference principle one step further by using techniques such as, branch prediction to proactively store instructions and data in the cache before they are actually needed by the microprocessor. In addition, when an instruction or byte of data is read from memory, additional bytes following the instruction or data are read and cached. Once again, the principal of locality of reference dictates that these instruction and data bytes are more likely to be needed by the processor than the other data or instructions at large.
There are several different ways to map the system memory into the cache. One common approach utilizes an n-Way set-associative cache, in which the cache is segmented into sets. Each set contains n cache lines. A cache line is a sequential group of bytes (e.g., 32 or 64). For efficiency purposes, cache memory transactions are typically in cache lines rather than in single bytes. Cacheable locations in main memory may each be assigned to one of the sets of cache lines. As a result, each location may be cached in any one of the n locations within its assigned set. One special case of the n-Way set-associative cache is the direct-mapped cache. In a direct-mapped cache, n=1, and thus each memory location maps to only one location in the cache. Another special case of the n-Way set-associative cache is the fully associative cache. In this case, n=m, where m is the number of lines in the cache (and thus there is only one “set”). In this case, each memory location may map to any of the cache locations.
Two basic performance criteria for caches are hit ratio (i.e., the ratio of the memory accesses that are found in the cache to the total number of memory accesses) and search speed (i.e., how quickly a hit or miss determination can be made). In a direct-mapped cache, search speed is optimized at the cost of hit ratio. This is because it is relatively easy to determine hits/misses (since a memory location only maps to one cache line, only that line needs to be checked) but more difficult to have a high hit ratio since multiple memory locations map to a single cache line. Conversely, fully-associative caches optimize hit ratios while sacrificing search speed. Allowing all memory locations to map to any cache line improves the probability that there will be a hit but greatly increases the complexity of searches since all cache lines must be searched for each memory location. Set-associative caches attempt to compromise between the two by offering more associativity (and thus higher hit ratios) than direct-mapped caches while also offering faster search speeds than fully-associative caches.
Since cache size is limited by a number of factors (including die size, power consumption, and cost), care must be taken when loading information into the cache. Once particular area of concern for the designer arises when deciding a policy for overwriting or invalidating existing instructions and data in a cache to make room for new instructions and data. Thus, in set-associative caches where n>1 (and thus there are choices as to which line to cache a particular memory location), there needs to be some way to choose which of the possible cache lines to fill with new data. A common solution is to track the relative order of access to each cached memory location and then replace the least recently used instructions or data with new instructions or data. This solution is based on the principle that recently accessed cache lines are more likely to be accessed again. Other solutions include random replacement and first-in first-out techniques.
On average, least-recently used (LRU) cache replacement algorithms provide better performance than other algorithms. However, in order to determine the least recently used (LRU) cache line in an n-way set associative cache, conventional approaches require a significant amount of complex hardware, including counters and n-way multiplexers, to implement the LRU algorithm. Additionally, status bits for each cache entry track the usage of each entry. The number of status bits required to implement a conventional LRU algorithm for a set with n ways is typically n*log
2
(n). When a new entry is made in the set, the status bits are scanned to determine which of the cache lines is the least recently used or invalid. The least recently used or invalid line is then evicted to make room for the new entry. Drawbacks of a conventional LRU replacement algorithm include the amount of hardware and number of status bits time required to implement the algorithm as well as the time and hardware required to scan for invalid entries in the set.
SUMMARY
Various embodiments of methods and systems for implementing a least recently used (LRU) cache replacement algorithm are disclosed. In a first embodiment, a computer system that includes a processor, a system memory, and an N-way set-associative cache coupled to the processor is disclosed. The N-way set-associative cache includes a memory that is logically divided into at least one set. Each set includes N ways (i.e., cache lines) that are each configured to store a line (e.g., a data or instruction line) from the system memory. For example, i

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Sectored least-recently-used cache replacement does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Sectored least-recently-used cache replacement, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Sectored least-recently-used cache replacement will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3296794

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.