Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2007-07-17
2011-11-08
Baderman, Scott (Department: 2114)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S011000, C714S013000
Reexamination Certificate
active
08055940
ABSTRACT:
A system and method detects communication error among multiple nodes in a concurrent computing environment. One or more barrier synchronization points/checkpoints or regions are used to check for a communication mismatch. The barrier synchronization point(s)/checkpoint(s) can be placed anywhere in the concurrent computing program. Once a node reaches a barrier synchronization point/checkpoint, it is not allowed to communicate with another node regarding data that is needed to execute the concurrent computing program, even if the other node has not reached the barrier synchronization point/checkpoint. Regions can also, or alternatively, be used to detect a communication mismatch instead of barrier synchronization points/checkpoints. A concurrent program on each node is separated into one or more regions. Two nodes communicate with each other when their regions are compatible. If their regions are not compatible, a communication mismatch occurs.
REFERENCES:
patent: 4816989 (1989-03-01), Finn et al.
patent: 4914657 (1990-04-01), Walter et al.
patent: 5768538 (1998-06-01), Badovinatz et al.
patent: 5987477 (1999-11-01), Schmuck et al.
patent: 6029205 (2000-02-01), Alferness et al.
patent: 6216174 (2001-04-01), Scott et al.
patent: 6336161 (2002-01-01), Watts
patent: 6430600 (2002-08-01), Yokote
patent: 6651242 (2003-11-01), Hebbagodi et al.
patent: 6718484 (2004-04-01), Kodera
patent: 6834358 (2004-12-01), Korenevsky et al.
patent: 7117248 (2006-10-01), Jordan, Jr.
patent: 7191294 (2007-03-01), Nakamura et al.
patent: 7305582 (2007-12-01), Moser et al.
patent: 7610510 (2009-10-01), Agarwal et al.
patent: 2005/0050374 (2005-03-01), Nakamura et al.
patent: 2005/0278620 (2005-12-01), Baldwin et al.
patent: 2006/0200730 (2006-09-01), Daugherty
patent: 2007/0150714 (2007-06-01), Karstens
patent: 2007/0174484 (2007-07-01), Lussier et al.
patent: 2007/0260909 (2007-11-01), Archer et al.
Johnson, T. et al., “Cyclical cascade chains: a dynamic barrier synchronization mechanism for multiprocessor systems,”Proceedings of the 15th International Parallel and Distributed Processing Symposium, pp. 2061-2068 (2001).
Klaiber, Alexander et al., “A Comparison of Message Passing and Shared Memory Architectures for Data Parallel Programs,” retrieved online at http://citeseer.ist.psu.edu/cache/papers/cs/7993/http:zSzzSzstudents.cs.byu.eduzSz˜clementzSzcs584zSzklaiber.pdf/klaiber94comparison.pdf (1994).
Invitation to Pay Additional Fees for Application No. PCT/US2007/016170, dated Feb. 19, 2008.
Ellis Edric
Martin Jocelyn Luke
Baderman Scott
Butler Sarai
Nelson Mullins Riley & Scarborough LLP
The MathWorks, Inc.
LandOfFree
Recoverable error detection for concurrent computing programs does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Recoverable error detection for concurrent computing programs, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Recoverable error detection for concurrent computing programs will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4283218