Multiprocessor system having controller for controlling the...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S118000, C711S124000

Reexamination Certificate

active

06631447

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to an improved high performance multiprocessor computer system, and more particularly to a cache memory coherency control for distributed cache memories to be used therein.
There is significant ongoing research and development on scalable shared-memory multiprocessor systems capable of efficiently operating a plurality of processors in the order of tens to several thousands of units. Many of these systems adopt a so-called Non-Uniform Memory Access Architecture (NUMA) which has a distributed memory system configuration. That is, when a single memory is shared by several thousand processors in a system, the system cannot achieve its utmost performance due to a bottleneck likely to arise in concurrent accessing of the shared memory. The NUMA architecture is intended to solve such a problem by distributing the shared memory.
On the other hand, along with a current technical trend for the operating frequencies in processors to increase, access latency in accessing a main memory has become an important factor in determining system performance. To improve the latency, it is preferred for the main memory to be provided in the vicinity of the processors. In this respect also, a distributed memory system configuration (NUMA) having a local memory for each processor is preferable. According to such system configuration, there is room for further significant improvement in latency, since the operating frequency of local memories can be increased with an increase in operating frequencies in the processors. Typical examples of such distributed memory systems are listed below.
(1) DASH System at Stanford University: Daniel Lenoski, et. al., “The DASH Prototype: Implementation and Performance”, Proc. 19th Int. Symp. on Computer Architecture, 1992. (2) SCI (Scalable Coherent Interface): David B. Gustavson, “The Scalable Coherent Interface and Related Standards Projects”, IEEE MICRO, pp.10-22, 1992. (3) IBM RP3 (Research Parallel Processor) The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture”, Proc. of the 1985 Int. Conf. on Parallel Processing, pp.764-771, 1985.
As an important problem to be solved in any distributed memory system, there is the problem of cache memory coherency control which must be implemented for respective cache memories distributed in several thousand processors. This mechanism is necessitated to maintain cache coherency among the contents of cached data in respective cache memories in respective processors.
Conventionally, in the case of a multiprocessor system consisting of several processors, a cache coherence protocol system, which is referred to as the bus snooping system, is generally adopted. This system, in which each processor is coupled to a shared bus, implements its cache coherence scheme by monitoring transactions on the shared bus. Namely, when a particular processor wishes to read particular data, it broadcasts the address of its data to the shared bus. Any of the other processors, which are snooping transactions on the shared bus, when it finds an updated version of the desired data in its own cache memory, transfers said associated data to the requesting processor.
However, when this bus-snooping system is applied directly to any shared memory multiprocessor system having as many as several thousand unit processors, the following problems may occur. A first problem is that it takes too much time from the broadcasting, of the data address to the several thousand processors until the reception of reports from all of the processors reporting each cache coherency. Thereby, in consequence, there occurs an associated problem that even if an access latency in an access to a local memory is reduced by the distributed memory configuration, a delay in cache coherency prevents an instant utilization of the data. Further, a second problem is that the load on the shared bus becomes excessively great. Namely, every time a processor reads or writes data from and to memory, a broadcasting is issued to every other processor. As a result, there occurs too many transactions to be executed on the shared bus when viewed in respect of the overall system. In addition, the frequency of cache coherence procedures by a shared-bus snooping unit in each processor increases thereby resulting in a bottleneck, resulting in a problem that the shared bus system cannot achieve its utmost performance.
As prior art cache coherency protocol methods to solve such problems as described above, there are known two approaches: the directory-based protocol approach and the software-controlled protocol approach. In the directory-based protocol approach, each distributed memory has a directory which keeps track of the cached data for all of the caches in the system. Use of this directory eliminates the used to provide for means for broadcasting to all of the processors or to the bus-snooping mechanism.
With respect to the directory-based protocol approach, there are two approaches, such as the mapping protocol approach and the distributed link protocol approach.
By way of example, the foregoing DASH system adopts a mapping protocol approach. The directory for the mapping protocol approach consists of a cache presence bit which indicates cache memories which have a copy of shared data. Thus, the presence bit needs to have the same number of bits as the number of cache memories provided in the system. As modifications of this mapping method, there are also known a limit mapping method and a group mapping method. The limit mapping method is one which can reduce the number of bits required for indicating the cache presence, by limiting the number of cache memories which are allowed to have a copy of data on the shared memory. Further, in the group mapping protocol method, a group including several processors is defined as a unit for setting a cache presence bit, thereby decreasing the number of bits required for the cache presence bit. In each group thereof, it is possible to implement cache coherence by means of the bus snooping protocol. The above-mentioned DASH system adopts, in practice, the group mapping protocol method.
The distributed link protocol which is one of the directory-based protocols has been adopted by the aforementioned SCI system. The distributed link protocol is a method for providing each data on a shared memory and cache memories with link information, and a linked list is formed by linking every copied data in cache memories and a shared memory. For example, if a particular processor issues a request to delete a copy of particular data from a shared memory on its associated cache, the cache coherence control traces down the corresponding link information for the shared memory data until it finds an initial copy thereof to delete it. When the initial copy has further link information, a subsequent copy thereof can be traced down via the link information then to be deleted. According to this method, the directory information can be decreased advantageously in comparison with the mapping protocol method.
Another important cache coherence protocol system, which is different from the directory-based protocol, is a software controlled protocol system, which is adopted by the above-mentioned IBM RP3 system. The software controlled protocol system is provided with functions capable of assigning attributes distinguishing between cachable and non-cachable data items per a unit of pages, for example, per 4K bytes, as well as of invalidating a particular cache memory entry from the user's program. For example, a local data item characteristic to a particular task is assigned with a cachable attribute, while a data item which is shared between tasks is designated with a noncachable attribute. Then, when a task is transferred from one processor currently at work to another, the local data cached in the cache memory of the one processor is completely invalidated. Thereby, since it is insured that no copy of the local data thereof is present in the other cache memories, there is no need for a cache coh

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiprocessor system having controller for controlling the... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiprocessor system having controller for controlling the..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiprocessor system having controller for controlling the... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3124784

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.