Method and apparatus for minimizing dcache index match...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S202000, C711S203000, C711S206000, C711S210000, C711S216000

Reexamination Certificate

active

06253285

ABSTRACT:

BACKGROUND OF THE INVENTION
In general, main memory access is relatively slow compared to central processing unit (CPU) execution times. Therefore, most CPU architectures include one or more caches. A cache is a high-speed memory which can be associated with a small subset of referenced main memory. Because most memory reference patterns only require a small subset of the main memory contents, a relatively smaller, high-speed cache can service many of the memory references.
For example, instruction caches can improve efficiency because often in software programs a small section of code may be looping. By having the instructions in a high-speed, local instruction cache, they are accessed much faster. Data caches can likewise improve efficiency because data access tends to follow the principle of locality of reference. Requiring each access to go to the slower main memory would be costly. The situation can be even worse in a multi-processor environment where several CPUs may contend for a common bus.
Data cache systems in some configurations comprise both a data store and a tag array. The data store holds data copied from the main memory. Each tag array location holds a tag, or physical page address, for a block of consecutive data held in the data store in association with the tag location.
During a memory access, a virtual page address from the CPU core is translated by a page translator into a physical page address. The remainder of the address, or a portion thereof, is used to index into the tag array. The tag retrieved from the indexed tag array is compared with the translated physical page address, a match indicating that the referenced data is in the data store; a mismatch indicates that the data will have to be retrieved from main memory. Page translation occurs in parallel with the tag array lookup, minimizing delay.
A need also exists in multiprocessor systems to test the contents of the data cache system from outside the CPU. Several processors may reference the same physical address in memory. Besides looking up its own local cache, each CPU must check the caches of other CPUs in the system. Failure to do so would result in data incoherency between the individual caches as each CPU reads and writes to its own local copy of the same data from main memory.
To prevent this incoherency, a CPU sends “probes” to other CPUs during a memory reference. Each data cache system receiving a probe uses a physical address provided by the probe to look into its own tag array. If the data resides in its data store, the data cache system responds to the probing CPU accordingly allowing ownership arbitration to take place.
A problem with caches is that they are susceptible to reference patterns in which memory references collide in such a way that the entire cache is not utilized, e.g. where two memory addresses are referenced which have different page addresses but the same index value. Due to the common index, each memory reference will cause different data to be loaded to the same cache location, negating any beneficial effect of the cache. Unfortunately, these reference patterns, also known as “power-of-two stride” patterns, are somewhat common in many important software applications.
Set associative caches partially solve this problem by having more than one storage location for each index value, although they incur the additional cost of multiple port lookups into the cache tag and data arrays and additional hardware to decide in which of the locations to store a tag. For example, in a 2-way set associative cache, for each index value there are two possible locations into which data can be loaded. Thus it is not necessary to write over the previously loaded data. Of course, this does not fully resolve the problem if the power-of-two stride pattern comprises three or more colliding addresses.
Another method for dealing with the power-of-two stride problem hashes addresses into different locations such that collisions generated by 2
m
(or close to 2
m
) reference patterns, for some integer m, are minimized. For example, U.S. Pat. No. 5,509,135 (Steely), “Multi-Index Multi-Way Set-Associative Cache”, uses a different hashing function for each of the ways within a set. In another implementation targeted for direct-mapped caches, U.S. Pat. No. 5,530,958 (Agarwal), “Cache Memory System and Method With Multiple Hashing Functions and Hash Control Storage”, a first hashing function is applied to create a cache index. If this results in a cache miss, a second hashing function is then applied, resulting in a different index, and so on.
SUMMARY OF THE INVENTION
In the context of a modern microprocessor pipeline, load-to-use latency can be minimized for data cache hits by using a portion of the virtual address directly to index the cache. If the index includes only the unmapped portion of the address, this method can be safely used because the index bits are in effect, physical address bits.
However, the goal of most hashing techniques is to incorporate the higher-order bits of the virtual address into the index. Yet, if the virtual page address bits are used to index the primary cache as discussed above, it is possible for a single physical address to be cached into multiple locations in the cache. Sufficient store activity, i.e., stores to the various virtual addresses which map to the same physical address, can lead to memory coherence problems.
Cache inefficiency resulting from power-of-two strides can be improved by using a hash function operating on part of virtual address to construct a cache index. However, a hashing function of a virtual address to create the whole cache index would require that the probes check all 2
n
possible combinations where n is the size of the index, which is impractical.
The present invention solves this problem by employing a duplicate tag structure which physically indexes the cache at locations indexed by all combinations of the mapped virtual address bits concatenated with unmapped address bits. This guarantees that only one physical address is resident in the cache at a time, independent of the number of virtual mappings. The resolution of multiple virtual references to the same physical reference is referred to as synonym processing.
Given a duplicate tag structure or other means for synonym processing, the two bits VA<
14
,
13
> can be arbitrarily hashed using the upper virtual address bits, and the duplicate dcache tag structure will automatically solve all synonym processing issues.
Accordingly, a preferred embodiment of the present invention comprises hashing means, a data store, a tag array, a page translator, a comparator and a duplicate tag array. The hashing means hashes an index portion of a virtual address with a virtual page portion of the virtual address to form a cache index. The data store comprises a plurality of data blocks for holding data. The tag array comprises a plurality of tag entries corresponding to the data blocks, and both the data store and tag array are addressed with the cache index. The tag array provides a plurality of physical address tags corresponding to physical addresses of data resident within corresponding data blocks in the data store addressed by the cache index. The page translator translates a tag portion of the virtual address to a corresponding physical address tag. The comparator verifies a match between the physical address tag from the page translator and the plurality of physical address tags from the tag array, a match indicating that data addressed by the virtual address is resident within the data store. Finally, the duplicate tag array resolves synonym issues caused by hashing.
In the preferred embodiment, the hashing means is such that addresses which are equivalent mod 2
15
are pseudo-randomly displaced within the cache. The cache index comprises nine bits, and data blocks in the data store comprise sixty-four bytes.
In addition, the physical address tag is a physical page number, and a physical page comprises 8 Kb. The preferred hashing means maps VA<
14
,
15
XOR
13
,
12
:
6
>

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for minimizing dcache index match... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for minimizing dcache index match..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for minimizing dcache index match... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2528348

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.