System and method for terminating lock-step sequences in a...

Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S118000, C711S151000

Reexamination Certificate

active

06560682

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
The present invention is directed, in general, to multiprocessor systems and, more specifically, to a system and method for disrupting a lock-step sequence condition in a server containing multiple processor units.
BACKGROUND OF THE INVENTION
Increasingly, state-of-the-art computer applications implement high-end tasks that require multiple processors for efficient execution. Multiprocessor systems allow parallel execution of multiple tasks on two or more central processor units (“CPUs”). A typical multiprocessor system may be, for example, a network server. Preferably, a multiprocessor system is built using widely available commodity components, such as the Intel Pentium®Pro processor (also called the “P6” processor), PCI I/O chipsets, Pentium®Pro processor bus topology, and standard memory modules, such as SIMMs and DIMMs. There are numerous well-known multiprocessor system architectures, including symmetrical multiprocessing (“SMP”), non-uniform memory access (“NUMA”), cache-coherent NUMA (“CC-NUMA”), clustered computing, and massively parallel processing (“MPP”).
A symmetrical multiprocessing (“SMP”) system contains two or more identical processors that independently process as “peers” (i.e., no master/slave processing). Each of the processors (or CPUs) in an SMP system has equal access to the resources of the system, including memory access. A NUMA system contains two or more equal processors that have unequal access to memory. NUMA encompasses several different architectures that can be grouped together because of their non-uniform memory access latency, including replicated memory cluster (“RMC”), MPP, and CC-NUMA. In a NUMA system, memory is usually divided into local memories, which are placed close to processors, and remote memories, which are not close to a processor or processor cluster. Shared memories may be allocated into one of the local memories or distributed between two or more local memories. In a CC-NUMA system, multiple processors in a single node share a single memory and cache coherency is maintained using hardware techniques. Unlike an SMP node, however, a CC-NUMA system uses a directory-based coherency scheme, rather than a snoopy bus, to maintain coherency across all of the processors. RMC and MPP have multiple nodes or clusters and maintain coherency through software techniques. RMC and MPP may be described as NUMA architectures because of the unequal memory latencies associated with software coherency between nodes.
All of the above-described multiprocessor architectures require some type of cache coherence apparatus, whether implemented in hardware or in software. High speed CPUs, such as the Pentium®Pro processor, utilize an internal cache and, typically, an external cache to maximize the CPU efficiency. Because a SMP system usually operates only one copy of the operating system, the interoperation of the CPUs and memory must maintain data coherency. In this context, coherency means that, at any one time, there is but a single valid value for each datum. It is therefore necessary to maintain coherency between the CPU caches and main memory.
One popular coherency technique uses a “snoopy bus.” Each processor maintains its own local cache and “snoops” on the bus to look for read and write operations between other processors and main memory that may affect the contents of its own cache. If a first processor attempts to access a datum in main memory that a second processor has modified and is holding in its cache, the second processor will interrupt the memory access of the first processor and write the contents of its cache into memory. Then, all other snooping processors on the bus, including the first processor, will see the write operation occur on the bus and update their cache state information to maintain coherency.
Another popular coherency technique is “directory-based cache coherency.” Directory-based caching keeps a record of the state and location of every block of data in main memory. For every shareable memory address line, there is a “presence” bit for each coherent processor cache in the system. Whenever a processor requests a line of data from memory for its cache, the presence bit for that cache in that memory line is set. Whenever one of the processors attempts to write to that memory line, the presence bits are used to invalidate the cache lines of all the caches that previously used that memory line. All of the presence bits for the memory line are then reset and the specific presence bit is set for the processor that is writing to the memory line. Therefore, all of the processors do not have to reside on a common snoop bus because the directory maintains coherency for the individual processors.
From the foregoing description, it can be seen that from time to time, two or more processors will attempt to access data from the same location at the same time. In the normal operation of a multiprocessor system, this may result in one or more processors being “retried.” That is, a processor performs a memory access to a certain memory location and the memory access is denied because the memory location is temporarily unavailable. When this occurs, the processor retries the memory access within a very short period of time and usually succeeds in accessing the memory location during the retry.
It is known, however, that two or more processors may occasionally get trapped in an endlessly repeating cycle of retries that fails to ever access the desired memory location. This condition may be referred to as a “lock step sequence.” The circumstances leading to a lock step sequence are complex and proving that a multiprocessor design is not susceptible to a lock step condition is difficult due to the design complexity and the number of possible states in the system. In its essentials, a lock step sequence may be recognized as a group of CPUs trying to access a line of data in memory that has been locked out by another CPU that has control over that line. Each of the locked out CPUs retries the line and fails, thereby causing another retry to be scheduled. The sequencing of the retries by the CPUs is such that the CPU that has actual control over the line is prevented from unlocking the line because the memory controller is always busy servicing the retry requests of the locked out CPUs.
In this situation, a great deal of bus traffic appears to be occurring, but no actual work is being accomplished by many, if not all, of the CPUs. The applications being run by the multiprocessor system are instead “frozen” in place. As noted before, this condition is difficult to reproduce and correct due to the complexity of the timing of memory requests that cause the condition. The result is that many types of multiprocessor systems will from time to time lock up and require operator intervention to clear the condition. This causes much frustration and reduces the overall processing efficiency of the system.
Therefore, there is a need in the art for improved multiprocessor systems that can more effectively avoid intermittent frozen states that result from lock step sequences among two or more processors. In particular there is a need in the art for systems, circuits, and methods that are able to clear a lock step condition within a relatively short time period and without the need for operator intervention.
SUMMARY OF THE INVENTION
The lock-step sequence problems inherent in the prior art are overcome by the present invention. In one embodiment of the present invention, a control circuit is provided for use in a processing system containing a plurality of processors coupled to a main memory by a first common bus, wherein the control circuit perturbs a lock-step sequence of memory requests received from the processors. The control circuit comprises a memory request generator, adapted to be coupled to the first common bus, for generating at least one memory request operable to terminate the lock-step sequence of memory requests.
In one embodiment of the present invention, the at least one memory request is generated pseudo-ra

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for terminating lock-step sequences in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for terminating lock-step sequences in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for terminating lock-step sequences in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3039989

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.