Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-12-22
2002-10-29
Hudspeth, David (Department: 2651)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S138000, C711S170000
Reexamination Certificate
active
06473834
ABSTRACT:
FIELD OF THE INVENTION
The present invention is directed to computer processors, and more particularly, to a queuing mechanism in a computer processor for preventing stalling of cache reads during return of multiple data words.
BACKGROUND
FIG. 1
is a block diagram of a prior art computer system that may comprise one or more central processing units (CPU), only one of which is illustrated in
FIG. 1
at
10
. The CPU
10
comprises a code unit (CU)
16
, an execution unit (EU)
18
, a reference unit (RU)
20
, and a first-level cache (FLC)
22
. The FLC
22
interfaces to a second-level cache (SLC)
12
, which, in turn, interfaces to a main memory
14
.
The code unit
16
retrieves instructions from the main memory
14
and partially decodes them. The reference unit
20
resolves memory references in the instructions decoded by code unit
16
. The execution unit
18
executes the decoded instructions after any memory references have been resolved and the data has been retrieved from the main memory
14
or one of the caches
12
,
22
.
When the reference unit
20
attempts to resolve a memory reference from an instruction, it passes a virtual memory address to an address conversion unit (ACU) (not shown in
FIG. 1
) that translates the virtual address into an absolute address. The ACU then passes the absolute address to the FLC
22
. If the FLC
22
determines that the data at the reference address is already present to in its cache memory, the data is retrieved from the cache memory and passed to the execution unit
18
. If the data is not present in the FLC, then the FLC initiates a request for the data to the SLC
12
. If the data is present in the SLC
12
, the SLC will retrieve the data from its cache memory and pass it to the execution unit
18
. If the data is not present in the SLC
12
, then the SLC will initiate a fetch operation to retrieve the data from the main memory
14
. The data retrieved from the main memory
14
will then be passed to the execution unit
18
, and a copy of the data will be stored in the SLC
12
.
Data is fetched from the main memory
14
and stored in the FLC
22
and SLC
12
in four-word sets, i.e., each cache line comprises four words. Each word comprises six bytes of data. The FLC
22
is implemented as a two-way set associative cache memory, and the SLC
12
is implemented as a one-way set associative cache memory. Each cache memory contains a first random-access memory (RAM) (not shown) for storing the four-word data sets fetched from the main memory
14
, and a second RAM (not shown) for storing the cache tag values associated with each four-word data set in the first RAM.
FIG. 2
is a block diagram providing further details of the computer system of FIG.
1
. As shown, the FLC
22
receives memory addresses and associated routing information from the ACU
24
of the reference unit
20
via a bus. Data retrieved from the FLC
22
or the SLC
12
is passed to the other units
16
,
18
,
20
of the CPU
10
via a processor return path
28
. The FLC
22
and SLC
12
interface to the processor return path
28
via respective buses
34
and
38
. Function f
1
and multiplexer
26
represent the priority scheme for control of the processor return path
28
. Only one of the caches
12
,
22
can have access to the processor return path
28
at a time. The width of the processor return path is one word. Thus, both the FLC
22
and the SLC
12
must pass the four-words of a given cache line to the processor return path
28
, one word at a time. The SLC
12
has priority over the FLC
22
.
Logic implementing a second function, f
0
, in combination with a memory reference table (MRT), controls the flow of address and data information among the ACU
24
, SLC
12
, FLC
22
, and processor return path
28
, as described more fully below.
In use, the ACU
24
issues a memory request to the FLC
22
. The request includes a memory address and routing information that specifies to which other part of the processor (EU, RU, or CU) the data should be routed over the processor return path
28
. If there is a hit in the FLC
22
(i.e., the data is present), then the data is read out of the FLC
22
and delivered to the processor return path
28
via bus
34
. A signal indicating whether a “hit” occurs or not, is provided to the logic, f
0
.
If there is no hit, the logic, f
0
, forwards the request to the SLC
12
and makes an entry in the MRT
30
. The entry comprises a job number associated with the request, a word number (i.e., the address of the requested memory word), and the routing information for the requested word.
The SLC
22
returns four data words at a time, one word per clock, to both the FLC
12
and the processor return path
28
(via bus
38
). More specifically, the four words that are read out of the SLC
12
are stored as a new cache line in the FLC
22
. The MRT
30
is then accessed to determine which pending requests are satisfied by the four words returned from the SLC
12
. It is possible that the four words returned by the SLC
12
satisfy more than one pending request entry in the MRT
30
. When a match is found in the MRT
30
, the requested word, along with its routing information, is sent to the processor return path
28
.
Because the SLC
12
has priority over the FLC
22
for control of the processor return path
28
, a bottleneck can occur when a cache line (four words) returned by the SLC
12
satisfies multiple pending request entries in the MRT
30
. In this situation, the processor return path
28
will be tied up for multiple clocks as the words for each satisfied entry are sent over the processor return path
28
one word at a time. During this time, because the FLC
22
cannot access the processor return path
28
, it will hold up any new memory requests from the ACU
24
that hit in the FLC
22
(via a “hold” signal sent to the ACU by logic, f
0
). Subsequent requests, even if they would ultimately have been forwarded to the SLC
22
as a result of an FLC “miss”, are also held up. Thus, subsequent requests that would have been forwarded to the SLC
22
are delayed until the FLC
12
can service the first request (i.e., until the processor return path becomes available). This increases the FLC-to-SLC latency. Consequently, there is a need for an improved cache design that overcomes this limitation and reduces the likelihood that memory requests to the FLC and SLC will be held-up when the processor return path is busy. The present invention satisfies this need.
SUMMARY OF THE INVENTION
In a data processing system comprising a first level cache, a second level cache, and a processor return path, wherein only one of the first level cache and second level cache can control the processor return path at a given time, an improvement comprises a queue disposed between an output of the first level cache and the processor return path to buffer data output from the first level cache so that the first level cache can continue to process memory requests even though the second level cache has control of the processor return path. Preferably, the queue comprises a first-in, first-out queue. According to a further aspect of the present invention, the processor return path of the system accepts one word per clock cycle, the second level cache output two data words per clock cycle, and the system further comprises a second queue disposed between the output of the second level cache and the processor return path for buffering data output from the second level cache so that it can be provided to the processor return path one word per clock cycle.
A method according to the present invention, for use in a system comprising a first level cache, a second level cache, and a processor return path, wherein only one of the first level cache and second level cache can access the processor return path at a given time, comprises the step of buffering data output from the first level cache to be passed to the processor return path so that the first level cache can continue to process memory requests even though the second level cache has access to
Hurlock Steven T.
Naddeo Stanley P.
Hudspeth David
Rode Lise A.
Starr Mark T.
Tzeng Fred F.
Unisys
LandOfFree
Method and apparatus for prevent stalling of cache reads... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for prevent stalling of cache reads..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for prevent stalling of cache reads... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2968742