Cache management using a buffer for invalidation requests

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S144000, C711S145000, C711S134000

Reexamination Certificate

active

06651143

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to field of computer processing and more specifically relates to a method and apparatus to eliminate unnecessary detection of hardware cache misses.
BACKGROUND OF THE INVENTION
Computer architecture refers to the physical structure and interconnections of the registers, logical and arithmetic units, control units, and other hardware within a computer. All computers have at least one processor and more complex computers, such as servers, have many processors working together. Also, there are at least two kinds of memory devices associated with the computer: an internal volatile memory called random access memory which is erased when the computer is turned off; and an external memory, called a hard drive, which permanently stores the programs, also called applications, to be executed by a processor when called. Of course, there are a number of peripheral devices such as monitors, Internet connections, keypads, mouses, other pointing devices, other optical and magnetic drives, connections to other computers, etc. This invention is concerned with keeping the processor(s) busy.
The processor retrieves the applications or programs from the external memory into the internal memory. When data and/or instructions are needed for the application, the processor may retrieve the data/instructions from internal memory to its registers for arithmetic and logical processing. When the processor needs data/instructions from memory, it is idle until the data/instructions are available. Now that processor speeds are faster and faster, computer architects have directed an aspect of research and development into keeping the processor occupied and its registers filled for the next operation. One of many approaches taken by computer architects has been to minimize the time required to retrieve data/instructions from external and internal memory into the processor's registers. Incorporating smaller high speed memory units called caches nearer the memory is an implementation of this approach. These caches, moreover, may be hierarchical meaning that a level one (L1) cache is very close to the processor and is very fast which may be accessed in only one or very few processor cycles. There may be a L1 cache for instructions and a different L1 cache for data. There also may be level two (L2) and/or level three (L3) caches with the higher number denoting a larger, more distant, and perhaps slower cache but still closer and faster than either internal or external memory. Thus, when a processor needs data/instructions which is not readily available in its registers, it accesses its L1 cache by generating a control signal to access the cache directory and the data array in which the data is actually stored. A typical entry in a cache's directory, also called tag array, includes the cache coherency state of the data/instruction and a block address corresponding to the data in a data array of the cache. The address of the requested data/instruction is compared with the address in the cache's tag array. A cache miss occurs if the addresses of data/instructions do not match; a cache hit occurs if the addresses do match and the state is not invalid. If there is a L1 cache miss, the processor interrogates the L2 cache directory for the address of the requested data/instructions. If there is a L2 cache miss, the processor checks to see if the data/instruction is in the next layer's cache directory and so on until, if the data/instructions are not in any cache, it retrieves the data/instructions from memory. Access to a L1 cache typically takes on the order of one or just a few processor cycles with L2 and L3 cache accesses taking more cycles. Interrogating each of these caches may take a long time and actually degrade processor performance in the absence of nonblocking caches. Introducing resource conflicts may prevent the cache controllers to respond at optimum speed to incoming requests.
Managing the data in the caches has become a science in and of itself. There is always a cache management scheme, an example of which is that the most recently used (MRU) data and/or instructions are stored in the L1 cache. When the L1 cache gets full, then the oldest data/instructions may spill over to fill the L2 cache and so on. There are other cache management schemes, such as in U.S. Pat. No. 6,098,152 entitled Method and Apparatus for Miss Sequence Cache Block Replacement Utilizing a Most Recently Used State to Mounes-Toussi issued Aug. 1, 2000. Caches, moreover, may be accessed by different processors so that the same data/instructions, whether accessed by different processors or within different caches, must be checked before use to determine if the data is valid. For instance, if processor
1
has data in its cache and processor
2
is executing an operation to change that data, then processor
2
should wait until processor
1
is guaranteed not to access stale data. Maintaining valid data/instructions in the various caches is accomplished by a cache coherency scheme, an example of which is MESI. Each entry in a cache is tagged to indicate its state, i.e., whether the data/instruction is Modified, Exclusive, Shared, or Invalid. Modified data is data that is being modified by a processor and so another processor should wait until the modification is complete. Exclusive data means that the processor having the data in its cache has exclusive control of the data. Shared data is shared by other processors; and Invalid data should not be used by any processor. There are many cache coherency schemes; the MESI protocol above is only one example.
Computer architectures come in a myriad of arrangements today wherein the multiple processors may share caches and/or memory. Shown in
FIG. 1
a
is an example of a computer architecture
100
having a simple processor complex
102
which further comprises a processor
104
and a hierarchy of private caches, a L1 cache
110
, a L2 cache
120
up to multiple cache levels Ln cache
140
. The last cache level Ln
140
is typically connected over an internal bus
106
to memory
150
or other system bus interface (not shown).
FIG. 1
b
is an example of a computer architecture
100
having a multiprocessor processor complex
102
, each processor complex
102
having two or more processors
104
. Each processor
104
has a private L1 cache
110
and the two or more processors
104
within the same processor complex
102
may share a L2 cache
120
, and/or a L3 cache
130
and/or a L4 cache on the various levels interconnecting different processors.
Each of the processor complexes
102
may then be configured into other computer architectures
100
in which the processor complexes
102
may share memory and/or higher level caches.
FIG. 1
c
is an example of a computer architecture
100
referred to as Non-Uniform Memory Architecture (NUMA) characterized by distributed shared mapped memory. The computer's memory
150
is distributed in that each processor complex
102
is connected on an internal bus
106
to a local memory
150
with unique addresses. The local memory
150
of another processor complex
102
would have different addresses so that the processing complexes
102
would access each other's local memory for the address stored in that particular local memory going through the processor complex
102
via an interconnection
108
. Another computer architecture
100
is shown in
FIG. 1
d
and referred to as Cache Only Memory Architecture (COMA) in which each processor complex
102
has an attraction memory
150
. An attraction memory
150
is basically a large cache and can be considered as the lowest level cache in the processor complex
102
. Access to the attraction memory
150
is over an internal bus
106
through its respective processor complex
102
. Yet, another computer architecture
100
is called Symmetric MultiProcessor (SMP) architecture of
FIG. 1
e
in which the processor complexes
102
are interconnected on an internal bus
106
through an interconnection
108
and a system bus
11

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cache management using a buffer for invalidation requests does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cache management using a buffer for invalidation requests, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache management using a buffer for invalidation requests will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3153405

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.