High-performance communication method and apparatus for...

Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S141000, C711S162000, C709S216000, C710S054000

Reexamination Certificate

active

06295585

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates generally to the field of parallel computing and more particularly to a method of providing high performance recoverable communication between the nodes in a parallel computing system.
As it is known in the art, large scale parallel computers have historically been constructed with specialized processors and customized interconnects. The cost of building specialized processors in terms of components and time to market caused many computer manufacturers to re-evaluate system designs. Currently many vendors in the market are attempting to provide performance similar to that of custom designs using standard processors and standard networks. The standard processors and networks are generally marketed and sold as clustered computer systems.
By using standard components and networks, clustered systems have the advantage of providing a parallel computing system having a much lower cost design at a decreased time to market. However, because the standard network protocol is used, a communication overhead is incurred that translates into poor overall parallel system performance.
The source of much of the performance loss associated with standard networks arises because the currently existing network hardware is incapable of guaranteeing message delivery and order. Because these guarantees are not provided by network hardware, software solutions are required to detect and handle errors incurred during message transmission.
Network software typically comprises many layers of protocol. These network layers are executed by the operating system and work together in an attempt to detect dropped messages, transmission errors and to recover from the above events, among others. Because the operating system is linked to the network software, there is no provision for direct access by a given application program to the network. Accordingly, because there is no direct link between the application program and the network performance is further reduced due to the overhead of the network software interface.
One method for providing high performance communication was described in U.S. Pat. No. 4,991,079, entitled “Real-Time Data Processing System”, by Danny et al, assigned to Encore Computer Corporation, issued on Feb. 5, 1991 (hereinafter referred to as the Encore patent).
The Encore patent describes a write-only reflective memory system that provides a form of networking better suited for parallel computing than standard networks, called a write-only reflective memory data link. The reflective memory system includes a real time data processing system in which each of a series of processing nodes is provided with its own data store partitioned into a local section and a section which is to be shared between the nodes. The nodes are interconnected by a data link. Whenever a node writes to an address in the shared portion of the data store, the written data is communicated (i.e. ‘reflected’) to all of the nodes via the data link. The data in each address of the shared data store can only be changed by one of the nodes which has been designated as a master node for the corresponding address. Because each address containing shared data can only be written to by one node, collisions between different nodes attempting to change a common item of data cannot occur.
The Encore system, although it describes a method for providing high performance parallel computing, provides no mechanism for ensuring recoverable communication. Accordingly, because there are no hardware mechanisms for providing error recovery, the support must still be provided by software. As a result, the Encore system incurs a similar communication overhead that translates into reduced parallel system performance.
SUMMARY OF THE INVENTION
The current invention provides an interconnect for parallel computing systems having high performance and recoverable communication in the presence of errors.
In accordance with one aspect of the invention, a method for providing shared memory in a network including a plurality of nodes coupled by a data link includes the steps of allocating a portion of memory at each of the plurality of nodes to provide a shared memory for storing a plurality of data items, wherein a subset of the data items of the shared memory are writable by a subset of the plurality of nodes. The method includes the step of maintaining, in the shared memory of each of the plurality of nodes, at least one data structure corresponding to at least one item of data to be shared by the corresponding node, the data structure comprising data item access information for each of a subset of the plurality of nodes sharing the data item. In accordance with another aspect of the invention, a network ed computer system includes a plurality of nodes coupled by a data link and a memory having a first and second portion, the first portion comprising a plurality of local memory portions, the second portion accessible by each of the plurality of nodes. The network ed computer system also includes means, coupled to said second portion of the memory, for storing a plurality of data items, each data item to be shared by a subset of the plurality of nodes. The network ed computer system further includes means for providing access to the each of the plurality of data items by the corresponding subset of nodes, where the means for providing access comprises, for each data item, a synchronization structure stored in the second portion of memory. With such an arrangement, multiple nodes in a cluster system may access a shared data item while maintaining coherency.


REFERENCES:
patent: 4007450 (1977-02-01), Haibt et al.
patent: 4432057 (1984-02-01), Daniell et al.
patent: 4829422 (1989-05-01), Morton et al.
patent: 4991079 (1991-02-01), Dann et al.
patent: 5113510 (1992-05-01), Hills
patent: 5117350 (1992-05-01), Parrish et al.
patent: 5146607 (1992-09-01), Sood et al.
patent: 5255369 (1993-10-01), Dann
patent: 5313620 (1994-05-01), Cohen et al.
patent: 5313638 (1994-05-01), Ogle et al.
patent: 5317749 (1994-05-01), Dahlen
patent: 5469558 (1995-11-01), Lieberman et al.
patent: 5475858 (1995-12-01), Gupta et al.
patent: 5491811 (1996-02-01), Arimilli et al.
patent: 5535365 (1996-07-01), Barriuso et al.
patent: 5537569 (1996-07-01), Masubuchi
patent: 5584017 (1996-12-01), Pierce et al.
patent: 5588132 (1996-12-01), Cardoza
patent: 5903763 (1999-05-01), Takahashi
Carl J. Beckmann, et al., “Fast Barrier Synchronization Hardware”, Super Computing Conference, Nov. 12-16, 1990, pp. 180-189.
John M. Mellor—Crummey, et al., “Synchronization Without Contention”, Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 8-11, 1991, pp. 269-274.
James R. Goodman, et al., “Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors”, Third International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 3-6, 1989, pp. 64-75.
Lucci et al; “Reflective Memory Multiprocessor”, IEEE Electronic Library, pp. 85-94, 1/95.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

High-performance communication method and apparatus for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with High-performance communication method and apparatus for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and High-performance communication method and apparatus for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2537649

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.