Queue-less and state-less layered local data cache mechanism

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S117000, C711S141000

Reexamination Certificate

active

06418513

ABSTRACT:

CROSS-REFERENCES TO RELATED APPLICATIONS
The present invention is related to the following applications filed concurrently with this application: U.S. patent application Ser. No. 09/340,076 entitled “LAYERED LOCAL CACHE MECHANISM WITH SPLIT REGISTER LOAD BUS AND CACHE LOAD BUS”; U.S. patent application Ser. No. 09/340,075 entitled “LAYERED LOCAL CACHE WITH IMPRECISE RELOAD MECHANISM”; U.S. patent application Ser. No. 09/340,074 entitled “LAYERED LOCAL CACHE WITH LOWER LEVEL CACHE OPTIMIZING ALLOCATION MECHANISM”; U.S. patent application Ser. No. 09/340,073 entitled “METHOD FOR UPPER LEVEL CACHE VICTIM SELECTION MANAGEMENT BY A LOWER LEVEL CACHE”. ; U.S. patent application Ser. No. 09/340,082 entitled “LAYERED LOCAL CACHE WITH LOWER LEVEL CACHE UPDATING UPPER AND LOWER LEVEL CACHE DIRECTORIES”; U.S. patent application Ser. No. 09/340,078 entitled “HIGH PERFORMANCE STORE INSTRUCTION MANAGEMENT VIA IMPRECISE LOCAL CACHE UPDATE MECHANISM”; U.S. patent application Ser. No. 09/340,079 entitled “HIGH PERFORMANCE LOAD INSTRUCTION MANAGEMENT VIA SYSTEM BUS WITH EXPLICIT REGISTER LOAD AND/OR CACHE RELOAD PROTOCOLS”; U.S. patent application Ser. No. 09/340,080 entitled “METHOD FOR LAYERING LOCAL INSTRUCTION CACHE MANAGEMENT”; and U.S. patent application Ser. No. 09/340,081 entitled “METHOD FOR LAYERING LOCAL TRANSLATION CACHE MANAGEMENT”.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, and more specifically to an improved method of accessing memory values (operand data or instructions) used by a processor of a computer system. In particular, the present invention makes more efficient use of a multi-level cache hierarchy, and ports values directly to, e.g., a rename register, instruction buffer, or translation table of the processor without the need for load queues or reload buffers in high level caches.
2. Description of Related Art
The basic structure of a conventional computer system includes one or more processing units connected to various input/output devices for the user interface (such as a display monitor, keyboard and graphical pointing device), a permanent memory device (such as a hard disk, or a floppy diskette) for storing the computer's operating system and user programs, and a temporary memory device (such as random access memory or RAM) that is used by the processor(s) in carrying out program instructions. The evolution of computer processor architectures has transitioned from the now widely-accepted reduced instruction set computing (RISC) configurations, to so-called superscalar computer architectures, wherein multiple and concurrently operable execution units within the processor are integrated through a plurality of registers and control mechanisms.
The objective of superscalar architecture is to employ parallelism to maximize or substantially increase the number of program instructions (or “micro-operations”) simultaneously processed by the multiple execution units during each interval of time (processor cycle), while ensuring that the order of instruction execution as defined by the programmer is reflected in the output. For example, the control mechanism must manage dependencies among the data being concurrently processed by the multiple execution units, and the control mechanism must ensure that integrity of sequentiality is maintained in the presence of precise interrupts and restarts. The control mechanism preferably provides instruction deletion capability such as is needed with instruction-defined branching operations, yet retains the overall order of the program execution. It is desirable to satisfy these objectives consistent with the further commercial objectives of minimizing electronic device count and complexity.
An illustrative embodiment of a conventional processing unit for processing information is shown in
FIG. 1
, which depicts the architecture for a PowerPC™ microprocessor
12
manufactured by International Business Machines Corp. (IBM—assignee of the present invention). Processor
12
operates according to reduced instruction set computing (RISC) techniques, and is a single integrated circuit superscalar microprocessor. As discussed further below, processor
12
includes various execution units, registers, buffers, memories, and other functional units, which are all formed by integrated circuitry.
Processor
12
is coupled to a system bus
20
via a bus interface unit BIU
30
within processor
12
. BIU
30
controls the transfer of information between processor
12
and other devices coupled to system bus
20
such as a main memory
18
. Processor
12
, system bus
20
, and the other devices coupled to system bus
20
together form a host data processing system. Bus
20
, as well as various other connections described, include more than one line or wire, e.g., the bus could be a 32-bit bus. BIU
30
is connected to a high speed instruction cache
32
and a high speed data cache
34
. A lower level (L2) cache (not shown) may be provided as an intermediary between processor
12
and system bus
20
. An L2 cache can store a much larger amount of information (instructions and operand data) than the on-board caches can, but at a longer access penalty. For example, the L2cache may be a chip having a storage capacity of 512 kilobytes, while the processor may be an IBM PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. A given cache line usually has several memory words, e.g., a 64-byte line contains eight 8-byte words.
The output of instruction cache
32
is connected to a sequencer unit
36
(instruction dispatch unit). In response to the particular instructions received from instruction cache
32
, sequencer unit
36
outputs instructions to other execution circuitry of processor
12
, including six execution units, namely, a branch unit
38
, a fixed-point unit A (FXUA)
40
, a fixed-point unit B (FXUB)
42
, a complex fixed-point unit (CFXU)
44
, a load/store unit (LSU)
46
, and a floating-point unit (FPU)
48
.
The inputs of FXUA
40
, FXUB
42
, CFXU
44
and LSU
46
also receive source operand information from general-purpose registers (GPRs)
50
and fixed-point rename buffers
52
. The outputs of FXUA
40
, FXUB
42
, CFXU
44
and LSU
46
send destination operand information for storage at selected entries in fixed-point rename buffers
52
. CFXU
44
further has an input and an output connected to special-purpose registers (SPRs)
54
for receiving and sending source operand information and destination operand information, respectively. An input of FPU
48
receives source operand information from floating-point registers (FPRs)
56
and floating-point rename buffers
58
. The output of FPU
48
sends destination operand information to selected entries in floating-point rename buffers
58
.
As is well known by those skilled in the art, each of execution units
38
-
48
executes one or more instructions within a particular class of sequential instructions during each processor cycle. For example, FXUA
42
performs fixed-point mathematical operations such as addition, substraction, ANDing, ORing, and XORing utilizing source operands received from specified GPRs
50
. Conversely, FPU
48
performs floating-point operations, such as floating-point multiplication and division, on source operands received from FPRs
56
. As its name implies, LSU
46
executes floating-point and fixed-point instructions which either load operand data from memory (i.e., from data cache
34
) into selected GPRs
50
or FPRs
56
, or which store data from selected GPRs
50
or FPRs
56
to memory
18
.
Processor
12
may include other registers, such as configuration registers, memory management registers, exception handling registers, and miscellaneous registers, which are not shown. Processor
12
carries out program instructions from a user application or the operating system, by routing the instructions and operand data to the appropriate execution units, buffers and registers, and by sending the resulting output to the system memory device (RAM), or

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Queue-less and state-less layered local data cache mechanism does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Queue-less and state-less layered local data cache mechanism, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Queue-less and state-less layered local data cache mechanism will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2894595

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.