Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2000-12-21
2003-11-18
Sparks, Donald (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S144000, C711S145000, C711S134000
Reexamination Certificate
active
06651143
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to field of computer processing and more specifically relates to a method and apparatus to eliminate unnecessary detection of hardware cache misses.
BACKGROUND OF THE INVENTION
Computer architecture refers to the physical structure and interconnections of the registers, logical and arithmetic units, control units, and other hardware within a computer. All computers have at least one processor and more complex computers, such as servers, have many processors working together. Also, there are at least two kinds of memory devices associated with the computer: an internal volatile memory called random access memory which is erased when the computer is turned off; and an external memory, called a hard drive, which permanently stores the programs, also called applications, to be executed by a processor when called. Of course, there are a number of peripheral devices such as monitors, Internet connections, keypads, mouses, other pointing devices, other optical and magnetic drives, connections to other computers, etc. This invention is concerned with keeping the processor(s) busy.
The processor retrieves the applications or programs from the external memory into the internal memory. When data and/or instructions are needed for the application, the processor may retrieve the data/instructions from internal memory to its registers for arithmetic and logical processing. When the processor needs data/instructions from memory, it is idle until the data/instructions are available. Now that processor speeds are faster and faster, computer architects have directed an aspect of research and development into keeping the processor occupied and its registers filled for the next operation. One of many approaches taken by computer architects has been to minimize the time required to retrieve data/instructions from external and internal memory into the processor's registers. Incorporating smaller high speed memory units called caches nearer the memory is an implementation of this approach. These caches, moreover, may be hierarchical meaning that a level one (L1) cache is very close to the processor and is very fast which may be accessed in only one or very few processor cycles. There may be a L1 cache for instructions and a different L1 cache for data. There also may be level two (L2) and/or level three (L3) caches with the higher number denoting a larger, more distant, and perhaps slower cache but still closer and faster than either internal or external memory. Thus, when a processor needs data/instructions which is not readily available in its registers, it accesses its L1 cache by generating a control signal to access the cache directory and the data array in which the data is actually stored. A typical entry in a cache's directory, also called tag array, includes the cache coherency state of the data/instruction and a block address corresponding to the data in a data array of the cache. The address of the requested data/instruction is compared with the address in the cache's tag array. A cache miss occurs if the addresses of data/instructions do not match; a cache hit occurs if the addresses do match and the state is not invalid. If there is a L1 cache miss, the processor interrogates the L2 cache directory for the address of the requested data/instructions. If there is a L2 cache miss, the processor checks to see if the data/instruction is in the next layer's cache directory and so on until, if the data/instructions are not in any cache, it retrieves the data/instructions from memory. Access to a L1 cache typically takes on the order of one or just a few processor cycles with L2 and L3 cache accesses taking more cycles. Interrogating each of these caches may take a long time and actually degrade processor performance in the absence of nonblocking caches. Introducing resource conflicts may prevent the cache controllers to respond at optimum speed to incoming requests.
Managing the data in the caches has become a science in and of itself. There is always a cache management scheme, an example of which is that the most recently used (MRU) data and/or instructions are stored in the L1 cache. When the L1 cache gets full, then the oldest data/instructions may spill over to fill the L2 cache and so on. There are other cache management schemes, such as in U.S. Pat. No. 6,098,152 entitled Method and Apparatus for Miss Sequence Cache Block Replacement Utilizing a Most Recently Used State to Mounes-Toussi issued Aug. 1, 2000. Caches, moreover, may be accessed by different processors so that the same data/instructions, whether accessed by different processors or within different caches, must be checked before use to determine if the data is valid. For instance, if processor 
1
 has data in its cache and processor 
2
 is executing an operation to change that data, then processor 
2
 should wait until processor 
1
 is guaranteed not to access stale data. Maintaining valid data/instructions in the various caches is accomplished by a cache coherency scheme, an example of which is MESI. Each entry in a cache is tagged to indicate its state, i.e., whether the data/instruction is Modified, Exclusive, Shared, or Invalid. Modified data is data that is being modified by a processor and so another processor should wait until the modification is complete. Exclusive data means that the processor having the data in its cache has exclusive control of the data. Shared data is shared by other processors; and Invalid data should not be used by any processor. There are many cache coherency schemes; the MESI protocol above is only one example.
Computer architectures come in a myriad of arrangements today wherein the multiple processors may share caches and/or memory. Shown in 
FIG. 1
a 
is an example of a computer architecture 
100
 having a simple processor complex 
102
 which further comprises a processor 
104
 and a hierarchy of private caches, a L1 cache 
110
, a L2 cache 
120
 up to multiple cache levels Ln cache 
140
. The last cache level Ln 
140
 is typically connected over an internal bus 
106
 to memory 
150
 or other system bus interface (not shown).
FIG. 1
b 
is an example of a computer architecture 
100
 having a multiprocessor processor complex 
102
, each processor complex 
102
 having two or more processors 
104
. Each processor 
104
 has a private L1 cache 
110
 and the two or more processors 
104
 within the same processor complex 
102
 may share a L2 cache 
120
, and/or a L3 cache 
130
 and/or a L4 cache on the various levels interconnecting different processors.
Each of the processor complexes 
102
 may then be configured into other computer architectures 
100
 in which the processor complexes 
102
 may share memory and/or higher level caches. 
FIG. 1
c 
is an example of a computer architecture 
100
 referred to as Non-Uniform Memory Architecture (NUMA) characterized by distributed shared mapped memory. The computer's memory 
150
 is distributed in that each processor complex 
102
 is connected on an internal bus 
106
 to a local memory 
150
 with unique addresses. The local memory 
150
 of another processor complex 
102
 would have different addresses so that the processing complexes 
102
 would access each other's local memory for the address stored in that particular local memory going through the processor complex 
102
 via an interconnection 
108
. Another computer architecture 
100
 is shown in 
FIG. 1
d 
and referred to as Cache Only Memory Architecture (COMA) in which each processor complex 
102
 has an attraction memory 
150
. An attraction memory 
150
 is basically a large cache and can be considered as the lowest level cache in the processor complex 
102
. Access to the attraction memory 
150
 is over an internal bus 
106
 through its respective processor complex 
102
. Yet, another computer architecture 
100
 is called Symmetric MultiProcessor (SMP) architecture of 
FIG. 1
e 
in which the processor complexes 
102
 are interconnected on an internal bus 
106
 through an interconnection 
108
 and a system bus 
11
Ojanen Karuna
Peugh Brian R.
Sparks Donald
LandOfFree
Cache management using a buffer for invalidation requests does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache management using a buffer for invalidation requests, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache management using a buffer for invalidation requests will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3153405