Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area
Reexamination Certificate
2000-09-29
2003-02-04
Verbrugge, Kevin (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Shared memory area
Reexamination Certificate
active
06516393
ABSTRACT:
FIELD OF THE INVENTION
This invention is related to computers and computer systems, and in particular to mechanisms for resolving address contention and prioritization of access to resources within a shared memory system.
BACKGROUND OF THE INVENTION
Multiprocessor systems can take many forms and individual designs may contain many unique features. Common among multiprocessor systems is the requirement to resolve shared address conflicts. Shared address conflicts occur when one or more processors attempt to update shared data. Since resolving this type of conflict necessitates a serialized access, system designers avoid scenarios where this type of activity occurs. For example, a processing unit may be assigned a private address space by the operating system; enabling that processor to work uninhibited by conflicts. Even in this environment, an idle processor will commonly obtain new work from a queue stored in a shared address space. As the speed and number of processors increase, this coordination of workload becomes more critical. Some workloads, however, desire interaction among many processors and efficient resolution of conflicts is required even with relatively slow processors. For example, large databases are maintained for many businesses, these databases can be updated by several applications running simultaneously. Conflict resolution often becomes the limitation of the system. It is desired to have a multiprocessor system that minimizes these shared conflicts, but also minimizes the performance impact when it does occur.
Technological advances have created faster processors while also giving denser, but relatively slower memory. Cache hierarchies, layers of faster but smaller capacity memories, have been added to offset some of this impact and reduce access delay. Since the cache is a subset of the total memory a processor can access, a directory is required to keep track of which blocks of main memory correspond to what blocks are held in the cache. Since all updates to memory must be visible to all processors in a shared memory multiprocessor system, changes to data in the caches must be made available to all processors and devices in this system. A common method in the art has been to add tags to the directories of the caches that indicate the ownership state of each block in the cache (directory-based cache coherence). This ownership state will indicate a processors write authority of the block. If a processor wishes to update a block of data in a cache, it must first obtain exclusive rights via some interprocessor communication. Once it has exclusive authority the processor may change the directory ownership state of the contested block and proceed with its updates. What is important is that interprocessor communication is required to pass ownership of shared blocks between processors. This interprocessor communication can add significant delay to the overall delay associated with accessing data. Access to the interprocessor communication is usually serialized in order to ensure one processor can update the contested block. This usually means that processors must request priority in some manner to use the required resources. A good priority design, one that ensures fair access, is critical to ensure proper work distribution and avoid starvation of requesters. As the number of memory requesters increase, it becomes more difficult to maintain equal access to memory resources and can impede the scalability of the multiprocessor system. A priority system that can reduce the negative impact of the processor interconnect and associated traffic is desired. Priority designs have been used that enter requests in a centralized queue or similar ordering mechanism to ensure requests are presented in the same order they are received. The queue maintains the order while the memory system completes each request presented by this queuing system. This solution guarantees the order, but requires the order to be set before any resource is evaluated for availability or conflict. An example would be the availability of cache interleaves. With this solution, no request could bypass a request that was stalled due to its target cache interleave being unavailable. This means that the additional latency for that one request is now added latency to all requests queued behind it. Also, the requests in the queue may not have an address conflict with the stalled request and therefore do not benefit from the forced serialization. Additional patches to avoid this queuing effect could be employed at the input of the stack. For example, creating multiple stacks based on an address range would require checking the address before entry into the stack. The effectiveness of this solution would be limited by how much hardware, in the form of physical arrays or other memory device, could be available for this purpose. Also, all improvements of this kind would negatively impact the nominal latency by adding additional checking and categorization of requests before priority. Some other priority schemes use sampling in an attempt to reduce some of the complex interactions that can cause request starvation. The sample, or snapshot, tags the requests outstanding at a given time and ensures that all of these requests are satisfied before a new sample is taken. Since a satisfied request in the current snapshot cannot create a visible request until the snapshot is emptied, some starvation scenarios may be avoided. However, snapshot designs depend on the requests not having dependencies between them which, in some implementations, may not be true and can lead to a deadlock condition: a request in the snapshot waiting for a request not in the snapshot. This class of solution does not attempt to improve access among contentious requests, it just limits the scope of the problem to a scale presumed to be manageable and is therefore likely to add to the nominal latency without a guarantee of success.
A Least Recently Used (LRU) priority algorithm may be used to ensure that all processors have fair access. In order to limit the latency of the priority request, a partial-LRU is used. This partial-LRU uses fewer bits and allows for quicker calculation of priority. In this system, requests are arbitrated and presented to a pipelined structure. The request moves through this pipeline and initiates a cache-access and associated directory lookup, checks resource availability and checks if any other request has the same address locked. If there is no owner, the current requester assumes ownership by setting a lock. This lock remains active until the request has been satisfied. Once a lock is set, all subsequent requests to the same address block their memory access and set a resource-need for the owning requester to complete. This resource-need prevents further pipeline accesses until the owning request completes. The owning request is then free to change the ownership status of the line, if necessary, and return the requested data to the processor. Such a system works well until address activity, as in the interprocessor synchronization of the kind described earlier, occurs. In those cases, many requests are attempting to access the same address. They will all enter the pipeline and set their resource-need for the owning processor, the owning processor will complete and the remaining requests will all vie for priority again, a new owner will set its lock and all subsequent requests will then set a resource-need for the new owner. Each request will busy the pipe, and other resources, only to set its resource-need for the newly designated owner. Once the new owner completes the process starts again. With each completion, the priority mechanism is tested again and resources busied causing increased traffic and latency. In addition, a completed processor may issue another request to the same address before all processors have accessed the data. Since the priority logic has been optimized for best-case, and due to inherent latency with the request generation after a lock is cleared, the new request can beat those waiting. Th
Fee Michael
Mak Pak-kin
Augspurger Lynn L.
Cantor-Colburn LLP
International Business Machines - Corporation
Verbrugge Kevin
LandOfFree
Dynamic serialization of memory access in a multi-processor... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Dynamic serialization of memory access in a multi-processor..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic serialization of memory access in a multi-processor... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3147778