Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2001-08-15
2003-09-16
Sparks, Donald (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S118000, C711S205000, C711S206000, C711S207000, C711S137000, C711S200000
Reexamination Certificate
active
06622211
ABSTRACT:
FIELD OF THE INVENTION
This invention relates in general to the field of microprocessor caches, and more particularly to virtual set caches.
BACKGROUND OF THE INVENTION
Modern microprocessors include data caches for caching within the microprocessor the most recently accessed data to avoid having to load the data from physical memory or to store the data to physical memory, since accessing physical memory takes an order of magnitude longer than accessing the data cache. For efficiency reasons, data caches do not cache data on a byte granularity basis. Instead, data caches typically cache data on a cache line granularity basis. A common cache line size is 32 bytes.
Data caches are smaller than physical memory. Consequently, when a data cache caches a line of data, it must also save the address of the data in order to later determine whether it has the data when a new instruction executes that accesses data in memory. The saved address of the data is referred to as a tag. When the new instruction accesses a memory address, the data cache compares the new memory address with the addresses, or tags, it has stored to see if a match occurs. If the new address matches one of the tags, then the data is in the cache, and the cache provides the data to the requesting portion of the microprocessor, rather than the microprocessor fetching the data from memory. The condition where the data is in the data cache is commonly referred to as a cache hit.
Data caches store hundreds of tags for hundreds of cache lines cached in the data cache. Comparing a new address with the hundreds of tags stored in the cache would take too long and make the cache too slow. Therefore, caches are arranged as arrays of sets. Each set includes a cache line, or more often, multiple cache lines. A common arrangement for caches is to have four cache lines in a set. Each of the four cache lines is said to be in a different cache way. A cache with four cache lines in a set is commonly referred to as a four-way set associative cache. Typically, when a new cache line is to be stored into a set, the least recently used cache line of the set is chosen for replacement by the new cache line.
By arranging the cache as an array of sets, the time required to compare the new address with the addresses stored in the cache is reduced to an acceptable amount as follows. When a cache line is stored into the cache, the cache does not allow the cache line to be stored into any arbitrary one of the sets in the array. Instead, the set into which the cache line may be stored is limited based on the address of the cache line. The lower order bits of the new address are used to select only one of the sets in the array. The address bits used to select one of the sets from the array are referred to as the index. Since the cache is smaller than the physical memory, only the lower order bits are needed for the index. That is, since the number of cache lines stored in the cache is much smaller than the number of cache lines stored in memory, a fewer number of address bits are needed to index the cache than to index physical memory. Once a set in the cache is selected by the index, the cache need only compare the tags of the cache lines in the selected set with the new address to determine whether a cache hit has occurred.
The number of address bits needed for the index depends upon the number of sets in the array. For example, if the cache has 512 sets, then nine address bits are needed to index the array of sets. Which of the address bits is used for the index depends upon the size of a cache line. For example, if the cache line size is 32 bytes, the lower 5 bits of the address are not used, since those bits are only used to select a byte within the cache line. Hence, for a cache with 512 sets of 32-byte cache lines, address bits
13
:
5
may used as the index.
Modern microprocessors also support the notion of virtual memory. In a virtual memory system, program instructions access data using virtual addresses. The virtual addresses are rarely the same as the physical address of the data, i.e., the address of the location in physical memory where the data is stored. The physical address is used on the processor bus to access physical memory. Furthermore, the data specified by the virtual memory address may not even be present in physical memory at the time the program instruction accesses the data. Instead, the data may be present in secondary storage, typically on a disk drive.
The operating system manages the swapping of the data between disk storage and physical memory as necessary to execute program instructions. The operating system also manages the assignment of virtual addresses to physical addresses, and maintains translation tables used by the microprocessor to translate virtual addresses into physical addresses. Modern microprocessors employ a translation lookaside buffer (TLB), which caches the physical address translations of the most recently accessed virtual address to avoid having to access the translation tables to perform the translations.
Typical virtual memory systems are paging memory systems. In a paging memory system, physical memory is divided into pages, typically of 4 KB each. Consequently, only the upper bits of the virtual address need be translated to the physical address, and the lower bits of the virtual address are untranslated. That is, the lower bits are the same as the physical address bits, and serve as a physical byte offset from the base address of the physical page. The base address of the physical page is translated from the upper bits of the virtual address. For example, in a paging system with 4 KB pages, the lower 12 bits of the virtual address, i.e., bits
11
:
0
, are untranslated, and are physical address bits. Accordingly, if the virtual address is 32 bits, the upper 20 bits of the virtual address, i.e., bits
31
:
12
, are translated based on the translation tables, and are cached in the TLB.
One side effect of a virtual memory system is that two different programs may access the same physical location in memory using two different virtual addresses. Consequently, caches insure data coherency by using the physical address to keep track of the cached data. That is, the tags are physical addresses. Additionally, physical addresses should be used for the index. However, using physical addresses for the index may be detrimental to performance for the reason now described.
The desire for larger caches continues, and the increase in integration densities of microprocessor integrated circuits has enabled modern microprocessors to employ relatively large caches. Borrowing from the examples above, assume a 64 KB four-way set associative cache with 32-byte cache lines in a paging system with 4 KB pages. Each set comprises 128 bytes of data in the four cache lines of the set. This results in 512 sets in the array. As was seen from the example above, the index would be address bits
13
:
5
. However, we also observe that address bits
13
:
12
are translated address bits, i.e., virtual address bits, not physical address bits.
One solution is to wait for the TLB to translate virtual address bits
13
:
12
and use the translated physical address bits
13
:
12
as the upper two bits of the index. However, this solution has the performance disadvantage that it now takes longer to index the cache to obtain or store data since we must wait for the TLB to perform its translation in order to use physical address bits
13
:
12
to index the cache. Potential consequences are that either the cycle time of the microprocessor must be increased, or another stage must be added to the microprocessor to accommodate the additional TLB lookup time to avoid lengthening the cycle time.
To avoid the performance penalty associated with waiting for the TLB to provide the translated physical address bits needed for the index, the microprocessor may use some of the virtual address bits in the index, such as virtual address bits
13
:
12
in the example above. A cache that uses some virtual address bits for its index is referred to as
Henry G. Glenn
Hooker Rodney E.
Davis E. Alan
Huffman James W.
IP-First L.L.C.
Namazi Mehdi
Sparks Donald
LandOfFree
Virtual set cache that redirects store data to correct... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Virtual set cache that redirects store data to correct..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Virtual set cache that redirects store data to correct... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3017448