Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1997-06-25
2001-11-13
Nguyen, Hiep T. (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S131000, C711S137000, C711S168000
Reexamination Certificate
active
06317810
ABSTRACT:
BACKGROUND
1. Field of Invention
This invention relates to retrieving data from computer memory. Specifically, this invention relates to a technique of improving data bandwidth of the processing unit of a computer by prefetching data anticipated to be needed by subsequent instructions of a computer program.
2. Description of Related Art
Modern computer systems utilize a hierarchy of memory elements in order to realize an optimum balance between the speed, size, and cost of computer memory. Most of such computer systems employ one or more DRAM arrays as primary memory and typically include a larger, but much slower, secondary memory such as, for instance, a magnetic storage device or CD ROM. A small, fast SRAM cache memory is typically provided between the central processing unit (CPU) and primary memory. This fast cache memory increases the data bandwidth of the computer system by storing information most frequently needed by the CPU. In this manner, information most frequently requested during execution of a computer program may be rapidly provided to the CPU from the SRAM cache memory, thereby eliminating the need to access the slower primary and secondary memories. Although fast, the SRAM cache memory is very expensive and should thus be of minimal size in order to reduce cost. Accordingly, it is advantageous to maximize the frequency which with information requested by the CPU is stored in cache memory.
FIG. 1
is an illustration of a general purpose computer
10
including a CPU
12
having an on-board, or internal, cache memory
14
. Typically, the internal cache
14
is divided into an instruction cache (I$), in which the most frequently requested instructions are stored, and a data cache (D$), in which the most frequently requested data is stored. The computer also includes an external cache (E$)
16
and a primary memory
18
. During execution of a computer program, the computer program instructs the CPU
12
to fetch instructions by incrementing a program counter within the CPU
12
. In response thereto, the CPU
12
fetches the instructions identified by the program counter. If the instruction requests data, an address request specifying the location of that data is issued. The CPU
12
first searches the internal cache
14
for the specified data. If the specified data is found in the internal cache
14
, hereafter denoted as a cache hit, that data is immediately provided to the CPU
12
for processing.
If, on the other hand, the specified data is not found in the internal cache
14
, the external cache
16
, is then searched. If the specified data is not found in the external cache
16
, then the primary memory
18
is searched. The external cache
16
and primary memory
18
are controlled by an external cache controller
20
and a primary memory controller
22
, respectively, which may be both housed within the CPU
12
. If the specified data is not found in the primary memory
18
, access is requested to system bus
24
which, when available, routes the address request to a secondary memory
26
via an I/O controller
28
.
When the specified data is located in memory external to the CPU
12
, i.e., in either the external cache
16
, the primary memory
18
, or the secondary memory
26
, the data specified by the address request is routed to the CPU
12
for processing and, in addition, a corresponding row of data is loaded into the internal cache
14
. In this manner, subsequent address requests identifying other information in that row will result in an internal cache hit and, therefore, will not require access to the much slower external memory. In this manner, latencies associated with accessing primary memory may be hidden, thereby increasing the data bandwidth of the CPU
12
.
The processing of an address request through a memory hierarchy is illustrated in FIG.
2
. First, the CPU program counter (PC) is incremented to specify a new address and, in response thereto, a corresponding instruction is fetched (step
40
). Where, for instance, the instruction requests data, an address request specifying that data is provided to the data cache (D$) of the internal cache
14
for searching (step
42
). If the specified data is in the data cache (a D$ hit), as tested at step
44
, the specified data is immediately provided to the CPU (step
46
). If the specified data is not in the data cache (a D$ miss), the external cache is searched for the specified data (step
48
).
If the specified data is found in the external cache (an E$ hit), as tested at step
50
, then the specified data is loaded into the data cache (step
52
) and processing proceeds to step
44
. If the specified data is not found in the external cache, then primary memory is searched (step
54
). If the specified data is found in primary memory, as tested at step
56
, it is loaded into the data cache (step
52
) and provided to the CPU for processing; otherwise the specified data is retrieved from secondary memory (step
58
) and loaded into the data cache and provided to the CPU.
As shown in
FIG. 1
, there are additional devices connected to the system bus
20
. For example,
FIG. 1
illustrates an input/output controller
30
operating as an interface between a graphics device
32
and the system bus
24
. In addition, the figure illustrates an input/output controller
34
operating as an interface between a network connection circuit
36
and the system bus
24
.
Since latencies of primary memory, e.g., the access speeds of DRAM, are not increasing as quickly as are the processing speeds of modern CPUs, it is becoming increasingly important to hide primary memory latencies. As discussed above, primary memory latencies are hidden every time there is an internal cache hit, for when there is such a hit, the requested information is immediately provided to the CPU for processing without accessing primary memory.
The data bandwidth of a computer system may also be increased by providing an additional parallel pipeline such that, for instance, two data requests may be performed per cycle. To accommodate the additional pipeline, the existing data cache may be dual ported or an additional data cache may be provided in parallel to the existing data cache. Each of these options, however, effectively doubles the cost of data cache memory. For instance, dual porting the existing data cache, while not significantly increasing the total size of the data cache, results in halving the effective data cache memory available for each of the pipelines. On the other hand, providing in parallel an additional data cache similar in size to the existing data cache, while preserving the effective cache memory available for each pipeline, undesirably results in a doubling of the effective size of the data cache. As a result, there is a need to accommodate an additional parallel pipeline without doubling the cost of data cache memory.
SUMMARY
A central processing unit (CPU) of a computer has a data caching unit which includes a novel dual-ported prefetch cache configured in parallel with a conventional single-ported data cache. The CPU further includes first and second parallel pipelines for processing instructions of a computer program. The data cache is coupled to receive data requests from the first pipeline and the prefetch cache, which is much smaller than the data cache, is coupled to receive data requests from both the first pipeline and the second pipeline. If a data cache miss occurs, a row of data corresponding to the data request address is fetched from external memory, e.g., an external cache, a primary memory, or a secondary memory, and then stored in the data cache and the prefetch cache. Thereafter, if a prefetch cache hit occurs, a prefetch address is derived from the current data request and, in some embodiments, on additional information such as, for instance, instruction loop heuristics of a computer program. A row of data corresponding to this derived prefetch address is fetched from external memory and loaded into the prefetch cache. This prefetching operation frequently results in the prefetch cach
Chiacchia Denise
Lauterbach Gary
Lopez-Aguado Herbert
Lynch William L.
Nguyen Hiep T.
Pennie & Edmonds LLP
Sun Microsystems Inc.
LandOfFree
Microprocessor having a prefetch cache does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Microprocessor having a prefetch cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Microprocessor having a prefetch cache will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2573894