Use of a cache coherency mechanism as a doorbell indicator...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S118000, C711S122000, C711S142000, C711S143000, C711S146000, C710S022000, C710S100000

Reexamination Certificate

active

06785775

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to data processing systems employing multiple instruction processors and more particularly relates to multiprocessor data processing systems employing a hardware doorbell type interface to indicate a new entry on a server work queue.
2. Description of the Prior Art
It is known in the art that the use of multiple instruction and input/output processors operating out of common memory can produce problems associated with the processing of obsolete memory data by a first processor after that memory data has been updated by a second processor. The first attempts at solving this problem tended to use logic to lock processors out of memory spaces being updated. Though this is appropriate for rudimentary applications, as systems become more complex, the additional hardware and/or operating time required for the setting and releasing of locks can not be justified, except for security purposes. Furthermore, reliance on such locks directly prohibits certain types of applications such as parallel processing.
The use of hierarchical memory systems tends to further compound the problem of data obsolescence. U.S. Pat. No. 4,056,844 issued to Izumi shows a rather early approach to a solution. The system of Izumi utilizes a buffer memory dedicated to each of the processors in the system. Each processor accesses a buffer address array to determine if a particular data element is present in its buffer memory. An additional bit is added to the buffer address array to indicate invalidity of the corresponding data stored in the buffer memory. A set invalidity bit indicates that the main storage has been altered at that location since loading of the buffer memory. The validity bits are set in accordance with the memory store cycle of each processor.
U.S. Pat. No. 4,349,871 issued to Lary describes a bussed architecture having multiple processing elements, each having a dedicated cache memory. According to the Lary design, each processing unit manages its own cache by monitoring the memory bus. Any invalidation of locally stored data is tagged to prevent use of obsolete data. The overhead associated with this approach is partially mitigated by the use of special purpose hardware and through interleaving the validity determination with memory accesses within the pipeline. Interleaving of invalidity determination is also employed in U.S. Pat. No. 4,525,777 issued to Webster et al.
Similar bussed approaches are shown in U.S. Pat. No. 4,843,542 issued to Dashiell et al, and in U.S. Pat. No. 4,755,930 issued to Wilson, Jr. et al. In employing each of these techniques, the individual processor has primary responsibility for monitoring the memory bus to maintain currency of its own cache data. U.S. Pat. No. 4,860,192 issued to Sachs et al, also employs a bussed architecture but partitions the local cache memory into instruction and operand modules.
U.S. Pat. No. 5,025,365 issued to Mathur et al, provides a much enhanced architecture for the basic bussed approach. In Mathur et al, as with the other bussed systems, each processing element has a dedicated cache resource. Similarly, the cache resource is responsible for monitoring the system bus for any collateral memory accesses which would invalidate local data. Mathur et al, provide a special snooping protocol which improves system throughput by updating local directories at times not necessarily coincident with cache accesses. Coherency is assured by the timing and protocol of the bus in conjunction with timing of the operation of the processing element.
An approach to the design of an integrated cache chip is shown in U.S. Pat. No. 5,025,366 issued to Baror. This device provides the cache memory and the control circuitry in a single package. The technique lends itself primarily to bussed architectures. U.S. Pat. No. 4,794,521 issued to Ziegler et al, shows a similar approach on a larger scale. The Ziegler et al, design permits an individual cache to interleave requests from multiple processors. This design resolves the data obsolescence issue by not dedicating cache memory to individual processors. Unfortunately, this provides a performance penalty in many applications because it tends to produce queuing of requests at a given cache module.
The use of a hierarchical memory system in a multiprocessor environment is also shown in U.S. Pat. No. 4,442,487 issued to Fletcher et al. In this approach, each processor has dedicated and shared caches at both the L
1
or level closest to the processor and at the L
2
or intermediate level. Memory is managed by permitting more than one processor to operate upon a single data block only when that data block is placed in shared cache. Data blocks in dedicated or private cache are essentially locked out until placed within a shared memory element. System level memory management is accomplished by a storage control element through which all requests to shared main memory (i.e. L
3
level) are routed. An apparent improvement to this approach is shown in U.S. Pat. No. 4,807,110 issued to Pomerene et al. This improvement provides prefetching of data through the use of a shadow directory.
A further improvement to Fletcher et al, is seen in U.S. Pat. No. 5,023,776 issued to Gregor. In this system, performance can be enhanced through the use of store around L
1
caches used along with special write buffers at the L
2
intermediate level. This approach appears to require substantial additional hardware and entails yet more functions for the system storage controller.
Inherent in architectures which employ cache memory, is that the storage capacity is substantially less than the memory located at lower levels in the hierarchy. As a result, memory locations within the cache memory must often be cleared for use by other data quantities more recently needed by the instruction processor. For store-in cache memories, this means that those quantities modified by the instruction processor must first be rewritten to system memory before the corresponding location is available to store newly requested data. This “flushing” process tends to delay the availability of the newly requested data. Newer Input/Output interface protocols, such as InfiniBand, require the use of queue structures in main system memory to hold work request entries and a Doorbell type interface to inform the hardware that a new entry has been added. For best performance both the queue data and the Doorbell are to be located in the virtual address space of the application. There can be many applications with multiple work queues each, in typical system, that a single hardware unity will support.
Current state of the art for hardware Doorbells requires a single memory mapped register allocated on a software page boundary (typically 4k bytes) so the Operating System can manage the location in its normal virtual-to-physical address translation mechanism. This results in the waste of most of the page space needed for each Doorbell including a very large memory mapped space assigned to the hardware when multiple queues are in use. A lesser used option is to not use Doorbell but to require the hardware to poll each queue for flags indicating added entries. This requires additional memory bandwidth of the polling and increases the time between a single queue being investigated based on the is number of queues enabled.
SUMMARY OF THE INVENTION
The present invention overcomes the problems found in the prior art by providing a method of and apparatus for the cache memory coherency hardware to assist in generating the Doorbell type indication within a server platform.
The preferred mode of the present invention includes up to four main memory storage units. Each is coupled directly to each of up to four “pod”s. Each pod contains a level three cache memory coupled to each of the main memory storage units. Each pod may also accommodate up to two input/output modules.
Each pod may contain up to two sub-pods, wherein each sub-pod may contain up to two instruction processors. Each instruction

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Use of a cache coherency mechanism as a doorbell indicator... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Use of a cache coherency mechanism as a doorbell indicator..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Use of a cache coherency mechanism as a doorbell indicator... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3313733

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.