Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-06-18
2001-11-06
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S144000, C711S145000
Reexamination Certificate
active
06314496
ABSTRACT:
TECHNICAL FIELD
The present invention relates generally to computer processor technology. In particular, the present invention relates to cache coherency for a shared memory multiprocessor system.
BACKGROUND ART
A state of the art microprocessor architecture may have one or more caches for storing data and instructions local to the microprocessor. A cache may be disposed on the processor chip itself or may reside external to the processor chip and be connected to the microprocessor by a local bus permitting exchange of address, control, and data information. By storing frequently accessed instructions and data in a cache, a microprocessor has faster access to these instructions and data, resulting in faster throughput.
Conventional microprocessor-cache architectures were developed for use in computer systems having a single computer processor. Consequently, conventional microprocessor-cache architectures are inflexible in multiprocessor systems in that they do not contain circuitry or system interfaces which would enable easy integration into a multiprocessor system while ensuring cache coherency.
A popular multiprocessor computer architecture consists of a plurality of processors sharing a common memory, with each processor having its own local cache. In such a multiprocessor system, a cache coherency protocol is required to assure the accuracy of data among the local caches of the respective processors and main memory. For example, if two processors are currently storing the same data block in their respective caches, then writing to that data block by one processor may effect the validity of that data block stored in the cache of the other processor, as well as the block stored in main memory. One possible protocol for solving this problem would be for the system to immediately update all copies of that block in cache, as well as the main memory, upon writing to one block. Another possible protocol would be to detect where all the other cache copies of a block are stored and mark them invalid upon writing to one of the corresponding data block stored in the cache of a particular processor. Which protocol a designer actually uses has implications relating to the efficiency of the multiprocessor system as well as the complexity of logic needed to implement the multiprocessor system. The first protocol requires significant bus bandwidth to update the data of all the caches, but the memory would always be current. The second protocol would require less bus bandwidth since only a single bit is required to invalidated appropriate data blocks. A cache coherency protocol can range from simple, (e.g., write-through protocol), to complex, (e.g., a directory cache protocol). In choosing a cache coherence protocol for a multiprocessor computer system, the system designer must perform the difficult exercise of trading off many factors which effect efficiency, simplicity and speed. Hence, it would be desirable to provide a system designer with a microprocessor-cache architecture having uniquely flexible tools facilitating development of cache coherence protocols in multiprocessor computer systems.
A present day designer who wishes to construct a multiprocessor system using a conventional microprocessor as a component must deal with the inflexibility of current microprocessor technology. Present day microprocessors were built with specific cache protocols in mind and provide minimal flexibility to the external system designer. For example, one common problem is that a cache of a microprocessor is designed so that a movement of a data block out of a cache automatically sets the cache state for the block to a predetermined state. This does not give a designer of a multiprocessor system the flexibility to set the cache to any state in order to implement a desired cache protocol. Because of this significant complexity is necessarily added to the design of a cache protocol.
SUMMARY DISCLOSURE OF THE INVENTION
In accordance with the present invention, a computing apparatus connectable to a cache and a memory, includes a system port configured to receive a command having an address part identifying data stored in the cache which is associated with data stored in the memory and a next coherence state part indicating a next state of the data in the cache. The computing apparatus further includes an execution unit configured to execute the command to change the state of the data stored in the cache according to the next coherence state part of the command. The data may be blocks of memory where a block can be any addressable unit of memory including a byte, a word, or many words.
The computing apparatus may be connectable to the cache either internally and/or externally. In the preferred embodiment, the computing apparatus is a microprocessor having an internal data cache disposed on the same chip as the execution unit an d system port, and a cache port disposed on the chip and configured to connect the computing apparatus to a cache located externally to the chip. The computing apparatus may b e a microprocessor or other processor for computing.
The computing apparatus receives a command on the system port from an external system which is executed by the execution unit. The external system includes any system outside of the processor capable of exchanging data with the processor. The external system may be a bus structure including some logical circuitry connecting the processor to the main system memory. The external system may be a memory management system connecting multiple processors and main memory in a shared memory multiprocessor system. The external system logic could be complex enough that the external system has its own processor, and might include both bus structures or switched network structures.
The command submitted to the computing apparatus by the external system may be an atomic probe command. The atomic probe command further includes a data movement part identifying a condition for movement of the data out of the cache and the execution unit is further configured to delivery of the data to the system port according to the data movement part of the command. The data movement part specifies one of several modes of data movement. The execution unit may direct delivery of the data in accordance with the data movement part of the command only if the data is found located in the cache and the coherency state of that data is valid, or alternately, only if the coherency state of the data is dirty.
The next coherence state part of the probe command specifies the next state to set the data in the cache. The computing apparatus may change the state of the data in accordance with the next coherence state part of the command by setting the state of the data in the cache to a clean state designating that the cache has the exclusive copy of the data outside of main memory.
Alternately, the computing apparatus may change the state of the data in accordance with the next state part of the command by setting the state of the data in the cache to a clean/shared state indicating there is at least one more copy of the data in a cache of another computing apparatus and the data in the cache is clean.
Alternately, the computing apparatus may change the state of the data in accordance the next coherence state part of the command by setting the state of the data in the cache to invalid.
Alternately, the computing apparatus may change the state of the data in accordance with the next coherence state part of the command by setting the state of the data in the cache so as to transition to a next state conditioned on the current state of the data.
Alternately, the computing apparatus may change the state of the cache in accordance with the next coherence state part of the command by setting the state of the data in the cache so that if the current state of the data is clean then the next state of the data is clean/shared, if the current state of the data is dirty then the next state of the data is invalid, and if the current state of the data is dirty/shared then the next state of the data is clean/shared
Keller James B.
Kessler Richard E.
Razdan Rahul
Anderson Matthew D.
Compaq Computer Corporation
Conley & Rose & Tayon P.C.
Kim Matthew
LandOfFree
Method and apparatus for developing multiprocessor cache... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for developing multiprocessor cache..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for developing multiprocessor cache... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2582501