Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2002-01-23
2002-10-29
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S128000, C711S207000
Reexamination Certificate
active
06473835
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to computer temporary storage and, more particularly, to computer processor cache operations.
BACKGROUND OF THE INVENTION
The central processor of a computer manipulates data in main storage that is a copy of data maintained in a non-volatile storage unit, such as direct access storage devices (DASD) comprising magnetic or magneto-optical disk drive un its. In manipulating data, system programmers work with “real” addresses of data in main storage that are associated with corresponding data that are cross-referenced with virtual addresses in the DASD storage. The central processor unit (CPU) is typically implemented as part of an integrated circuit processor chip, which is constructed with various registers, data lines, main memory, and associated electronic components that are incorporated into a single package for mounting to a circuit board or module.
Data values with real addresses in main memory volatile storage must be synchronized, or cross-referenced, with the virtual addresses of the DASD storage. Such cross-referencing is generally accomplished with a Page Frame Table (PFT) kept in the main memory. The PFT is organized with a real address (RA) column and a virtual address (VA) column. In each row of the PFT, an entry in the RA data column cross-references main memory data to a DASD data location specified by the entry in the VA data column. With a load reference to an RA and the PFT, the VA of the data in the DASD can be found. In this way, data can be retrieved and stored between the main storage and the DASD storage. A typical size for main storage is 1 to 16 gigabytes (GB) of memory. Data pages of 4K bytes are cycled into main storage with each DASD data operation, so that each entry in the PFT references a 4K page of data (unless otherwise specified, references to size of data storage will be understood to be made with reference to bytes of data).
The main storage in a computer typically has an access cycle time of approximately 500 processor cycles, so going to the PFT to retrieve and store data 4K at a time delays the data operations, and is a waste of processor resources and is inefficient. It is anticipated that processors will soon have a cycle time of only 1 nanosecond (ns) or less for execution of instructions. Even at this speed, waiting 500 processor cycles (500 ns) to retrieve a page of data still wastes processor resources and is inefficient.
Cache storage is used to temporarily store smaller blocks of data and can be more quickly accessed than main memory. Data can be stored and retrieved in cache memory rather than going to the main memory storage for each data reference. Thus, cache can speed up processing. Cache on the same chip as the processor can be accessed within a few processor cycles, but is limited in size, because of fabrication and cost constraints. While the PFT may contain 1M of entries, a cache is more expensive to implement than main storage, and therefore has a reduced number of entries as compared with the PFT. Therefore, it is conventional to use multiple levels of cache, on multiple chips. Cache references are generally to 4K pages of data.
A cache directory indicates whether the data identified by an RA is stored in the cache, which is generally regarded as part of main memory. Thus, a processor instruction that references an RA can be provided to the cache. If the RA is in the cache directory, then the processor knows it can get the referenced data from the main memory cache storage rather than from the DASD. If the RA is not in the cache directory, a condition called a cache miss, then the processor must check with the PFT to find the VA of the data and then, using the VA, obtain the data from the DASD. Thus, data references must proceed through a translation process that checks the referenced RA to find if it is located in the cache directory, and to retrieve the data from DASD (via the PFT) if it is not in main storage. Similarly, a store reference to an RA must proceed through a translation process to ensure storage at the proper DASD location.
It is preferable to implement a cache on the same chip as a processor, and cache storage typically includes a cache directory of contents, a data cache with the actual data values, and a corresponding address cache of RA and VA information. The cache operates under control of a cache controller. The address cache is typically referred to as a translation look-aside buffer (TLB), because it stores addresses and translates between RA and VA information. To accommodate a large number of addresses in the limited width of the processor data bus, it is typical to index the RA space in the cache with only the low-order bits of an address. For example, if four bits are used to index the cache, an RA can reference a cache with sixteen rows of data, entries whose low order bits are numbered from zero to fifteen. These sixteen rows of RA can be viewed as a first column of the cache TLB. To reference more than sixteen entries, the cache must contain additional columns.
If an entry in the four-bit indexed cache has an RA ending in ‘0101’, for example, and if a second entry is received also with an RA ending in ‘0101’, then a second column of cache will be needed to properly enter the additional data value in the cache. The cache directory will still be sixteen bits (16 rows), but the TLB will comprise two columns, each of sixteen rows. It should be apparent that a relatively large TLB might be necessary to ensure that the real address of the requested data can be generated by the TLB without going to the PFT. In the example, the TLB entry must identify which of the two cache directory columns correspond to the requested data value, but the cache directory will still reference only the four low-order bits, in accordance with the cache directory indexing scheme. That is, the TLB must be sufficiently wide (for example, 16 bits wide) to identify the RA of the cache entry, whereas the cache directory need be only as wide as the number of low-order bits used to index the cache (that is, wide enough to address one column, or four bits in the example).
The additional columns of data cache are referred to as levels of associativity. It has been found to be most efficient if a TLB is constructed as a two-way or four-way associative table (that is, two or four columns of TLB are used to contain the address of the corresponding PFT address entries). A four-way TLB, for example, will be approximately one-fourth the size of the corresponding PFT.
In a conventional processor, it is known to include a second-level cache with a TLB and a corresponding data cache, referred to as L
2
TLB and L
2
data cache, respectively, for increased data efficiency. The L
2
cache may not be on the same chip as the processor and L
1
cache. If a miss in the first-level (L
1
) cache occurs, the address is provided to the L
2
TLB and checked against L
2
cache. The L
2
cache has been found useful because access operations and TLB access operations need to be performed in approximately the same time, and a typical miss rate for L
1
cache is 3%. If there is only L
1
cache, then each cache miss results in a PFT operation, which requires a wait of 500 processor cycles. The resulting wait time of 500 ns with each L
1
miss is too great a penalty with a 3% L
1
miss rate.
The L
2
cache provides a larger data space than the L
1
cache, but is sufficiently small provide relatively fast access times (on the order of 3 processor cycles). A typical size for a large L
2
data cache currently is 4M to 8M of data. The miss rate for the L
2
cache is usually on the order of 0.1%. It may be desirable to include yet another cache, a level-three (L
3
) cache and corresponding L
3
TLB. The L
3
cache typically would be larger than the L
2
cache, on the order of 128M of data (64K rows, referencing 4K pages). The L
3
cache, however, probably could not fit on the same chip with the processor and the L
1
cache, along with the L
2
cache. This limits the data transfer speed, because the on
Anderson Matthew D.
Kim Matthew
Truelson Roy W.
LandOfFree
Partition of on-chip memory buffer for cache does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Partition of on-chip memory buffer for cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Partition of on-chip memory buffer for cache will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2998874