Method and apparatus for reducing latency in set-associative...

Electrical computers and digital processing systems: memory – Address formation – Generating prefetch – look-ahead – jump – or predictive address

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S128000, C711S204000

Reexamination Certificate

active

06418525

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to computer memory systems; and in particular to a method and apparatus for reducing access latency in set-associative caches.
2. Discussion of Related Art
Cache memory is typically a small, high speed buffer located between the central processing unit (CPU) and main memory. The cache is used to temporarily hold those contents of main memory believed to be currently in use. Decisions regarding when to replace the contents of the cache are generally based on a least recently used (LRU) algorithm. The LRU algorithm causes cache memory locations to be replaced with the contents of main memory that were most recently used. Information in cache memory can be accessed in far less time than information in main memory. Thus, the CPU wastes less time waiting for instructions and/or operands to be fetched and/or stored in cache.
A direct-mapped cache limits the storage of the contents of any particular location in main memory to specific locations in cache. In contrast, an M-way set-associative cache maps the contents of each main memory location into any of M locations in cache. Essentially the M-way set-associative cache is a combination of M identical direct-mapped caches. However, access and retrieval from M-way set-associative caches is more complex. During every memory access to the M-way set-associative cache, each of the combination of M identical direct-mapped caches must be searched and the appropriate data selected and multiplexed to the output if there is a match. If a miss occurs, then a choice must be made between M possible cache lines as to which cache line must be deleted and rewritten with more recently used contents of main memory.
FIG. 1
illustrates a virtually-tagged 4-way set-associative cache memory of the prior art comprising a cache directory
10
, a cache array
12
, a directory mux
14
and an array mux
16
. The cache directory
10
comprises virtual addresses for each corresponding location in the cache array
12
. The cache array
12
stores the contents of the main memory location pointed to by the corresponding location or block in the cache directory
10
. A set is defined as a column in the cache array
12
and the corresponding column in the cache directory
10
. A congruence class is defined as a row in the cache array
12
and the corresponding row in the cache directory
10
. A block or a location is defined as the intersection of a particular set (column) and a particular congruence class (row). A location or block comprises one or more bytes of data.
An address
18
supplied to the cache memory comprises a directory tag
20
, a congruence class
22
and a block offset
24
. The directory tag
20
is used to select the desired set (column) in the cache directory
10
via the directory mux
14
. The congruence class tag
22
is used to select the desired congruence class (row) of both the cache directory
10
and the cache array
12
. The block offset
24
is used to select the desired byte within the desired block or location. The output of the directory mux
14
is used to select the desired set (column) of the cache array
12
via the array mux
16
.
The latency in accessing associative caches is higher than the latency in accessing direct-mapped caches due to the necessity of comparing the address against the tags stored across multiple sets of the cache directory
10
. If a match occurs, the set associated with the matching tag is used to select output from the corresponding set of the cache array
12
. The output of the cache array
12
is ultimately routed to registers and functional units. The so-called “late select problem” refers to the need for addresses to go through a cache directory
10
lookup and potentially address translation (if a physically-tagged cache is used) before the appropriate set of the cache array
12
can be selected. Thus, the late select problem adversely impacts latency in a set-associative cache.
Therefore, it would be advantageous if set selection information could be made available prior to searching the cache directory and translating the address.
Further details regarding caches can be found in the following references, which are hereby incorporated by reference:
1. U.S. Pat. No. 5,634,119 to Emma et al.
2. Chang, Sheldon S. L.
Electrical and Computer Engineering
III (1983).
3. Smith, Allan J.
Cache Memories—ACM Computing Surveys
Vol. 14 (1982).
4. Cekleov M. and Dubois M.
Virtual
-
Address Caches—IEEE Micro
(1997).
SUMMARY OF THE INVENTION
In accordance with illustrative embodiments of the present invention, a method for reducing access latency in set-associative caches is provided wherein data is read from locations of a memory selectable through at least one selecting cache, the method comprising the steps of generating set selection information, and storing the set selection information in a location that enables the set selection information to be made available for retrieval of data from the memory prior to the arrival of memory select information from the selecting cache.
An apparatus for reducing access latency in set-associative caches comprising a storage for storing set selection information; an M-way set-associative cache receiving an address and outputting M-sets of data determined by the address; and a multiplexor for multiplexing one of set selection information and set associative address, wherein said set selection information is made available prior to said set associative address for accessing said data.
An apparatus for reducing power consumption of set-associative caches comprising a set selection storage for storing set selection information; an M-way set-associative cache comprising an array and a directory, the directory outputting a set-associative tag portion of an address to the array; and a multiplexer for multiplexing one of said tag portion of an address from said directory and said set selection information for outputting one set of said M-sets of data.
Further in accordance with the present invention, a method of increasing the access speed of a set-associative memory using data addresses is provided. The addresses comprise an offset portion, a congruence class index, and a tag portion. The set associative memory comprises an array and a directory. The array stores data, and is partitioned into a plurality of array congruence classes. The array congruence class is partitioned into array sets. The array set comprises a cache line. The cache line comprises a plurality of data. The directory is partitioned into a plurality of directory congruence classes. The directory congruence class is partitioned into directory sets, each comprising a directory entry. The directory entry comprises an address tag and other status information including valid bits, parity, etc. The directory is partitioned such that there is a one-to-one correspondence between the directory entries and the cache lines such that the address tags are associated with one of the cache lines.
Preferably, the method comprises the steps of accessing contents of sets of a single array congruence class using the congruence class index, the single array congruence class being specified by the congruence class index, accessing contents of sets of a single directory congruence class using the congruence class index, the single directory congruence class being specified by the congruence class index, generating set selection information, utilizing the set selection information to select the sets of the array congruence class, outputting the data from the cache line in the selected set; comparing the tag portion to the address tags of the selected sets of the directory congruence class, comparing the selected set to the set selection information if one of the address tags in the selected congruence class is equal to the tag portion of the address, outputting a first control signal to indicate that the access was unsuccessful, and that the data output from the cache line is invalid if none of the address tags in the selected congruence class is e

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for reducing latency in set-associative... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for reducing latency in set-associative..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reducing latency in set-associative... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2904548

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.