Multiprocessing system employing pending tags to maintain...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S146000

Reexamination Certificate

active

06272602

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention generally relates to multiprocessor computer systems that employ cache subsystems, and more specifically, to maintaining cache coherence for pending transactions.
2. Description of the Relevant Art
A processor is a device that is configured to perform an operation upon one or more operands to produce a result. The operation is performed in response to an instruction executed by the processor. A computer system includes a processor and other components such as system memory, buses, caches, and input/output (I/O) devices. Multiprocessing computer systems include two or more processors, which may be employed to perform computing tasks. A particular computing task may be performed upon one processor while other processors perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the time required to perform the computing task as a whole.
A cache memory is a high-speed memory unit interposed in the memory hierarchy of a computer system between a slower system memory and a processor to improve effective memory transfer rates and accordingly improve system performance. The cache is usually implemented by semiconductor memory devices having speeds that are comparable to the speed of the processor, while the system memory utilizes a less costly, lower speed technology. The cache memory typically includes a plurality of memory locations that stores a block or a “line” of two or more words. The words may be of a variable number of bytes and may be parts of data or instruction code. Each line in the cache has associated with it an address tag that uniquely identifies the address of the line. The address tags are typically included within a tag array device. The address tag may include a part of the address bits and additional bits that are used for identifying the state of the line. Accordingly, a processor may read from or write directly into one or more lines in the cache if the lines are present in the cache and if they are valid. This event in which the processor deals directly with the cache is referred to as cache “hit”. For example, when a read request originates in the processor for a new word, whether data or instruction, an address tag comparison is made to determine whether a valid copy of the requested word resides in a line of the cache memory. If present, the data is used directly from the cache, i.e. a cache read hit is completed. If not present, a line containing the requested word is retrieved from the system memory and stored in the cache memory. The requested line is simultaneously supplied to the processor. This event is referred to as a cache read miss.
Similarly, the processor may also write directly into the cache memory instead of the system memory. For example, when a write request is generated, an address tag comparison is made to determine whether the line into which data is to be written resides in the cache. If the line is present (and is valid), the data is written directly into the line. This event is referred to cache write hit. In many systems, a “dirty” bit for the line is then set. The dirty bit indicates the data stored within the line has been modified, and thus, before the line is deleted from the cache memory, overwritten, or replaced the modified data must be written into the system memory. If the line into which the data is to be written does not exist in the cache memory, the line is either fetched into the cache from the system memory to allow the data to be written into the cache, or the data is written directly into the system memory. This event is referred to as cache write miss.
In a multiprocessor system, copies of the same line of memory can be present in the caches of multiple processors. Cache coherence is the requirement that ensures that writes and reads by processors are observed by other processors in a well-defined order, in spite of the fact that each processor directly writes or reads its copy of the line in its cache. When a processor writes to a copy of the line in its cache, all other copies of the same line must either be invalidated or updated with the new value, so that subsequent reads by those other processors will observe the newly written value of the cache line. Similarly, when a processor incurs a read miss, it must be given the most recent value of the line, which could be a dirty line in another cache instead of in main memory.
Data coherence in a multiprocessor shared-memory system is typically maintained through employment of a snooping protocol or the use of a directory-based protocol. In a directory-based protocol, a directory of which processors have copies of each cache line is maintained. This directory is used to limit the number of processors that must snoop a given request for a cache line. The use of directories reduces the snoop traffic and thus allows larger systems to be built. However, use of directories increases the system's latency (which is caused by the directory lookup), hardware cost and complexity.
In a snooping protocol, each processor broadcasts all of its requests for cache lines, typically on snooping buses, to all other processors which then look up in their cache tags (“snoop”) to determine what action must be taken. For example, when a processor presents a write cycle on the bus to write data into the system memory, a cache controller device of another processor determines whether a corresponding line of data is contained within the cache. If a corresponding line is not contained within the cache, the cache controller takes no additional action and the write cycle is allowed to complete. If a corresponding line of data is contained within the cache, the cache controller determines whether the line of data is modified or not, typically by looking up the entries in the tag array. If the corresponding line is not dirty (e.g., the line is in a “shared” state), the line is marked invalid and the write cycle is allowed to complete. In some systems, if the line is dirty, the processor with a dirty line must respond with a copy of the line and simultaneously invalidate its own copy (referred to as a “copyback-invalidate”). If the line is dirty, and the request is read, the processor with a dirty line must respond with a copy of the line and mark the line shared.
Accordingly, snooping reads the cache tags, and sometimes modifies the cache tags when the snooped transaction is completed. However, there are several reasons why it is often desirable to de-couple snooping from the completion of the transaction when the processors cache tags are modified.
Generally speaking, a read or write transaction may take a relatively long time to complete. Instead of completing each individual transaction before starting the next one, performance (transaction throughput) may be improved by allowing subsequent transactions to start on the broadcast snooping bus before the previous transactions have completed. Therefore, each processor may have multiple transactions (it's own read or write bus requests as well as broadcast requests from other processors) pending completion. To maintain coherence, it may be necessary to keep these pending transactions in order and therefore a Pending Transaction Queue (PTQ) may be used between the snooping bus and the processor cache. Transactions are placed in this PTQ after they have appeared on the bus and before they have been completed (completion means that they have had their desired effect on the cache).
On a large multiprocessor system, the traffic on the bus may be very heavy. Generally, only a small portion of such heavy traffic is intended for any given processor in the system, as determined by snooping. For example, requests from other processors for lines that are determined by snooping to be not present in the processor's cache are irrelevant to the processor and need not be queued in the PTQ. Accordingly, de-coupled snooping may be performed in a duplicate set of tags. In some systems, the dupl

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiprocessing system employing pending tags to maintain... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiprocessing system employing pending tags to maintain..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiprocessing system employing pending tags to maintain... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2517858

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.