System and method for providing a fault tolerant distributed...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S039000, C709S241000, C709S241000, C709S241000

Reexamination Certificate

active

06618817

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to distributed computing environments, and more specifically to a fault tolerant distributed computing framework in a mission critical environment.
BACKGROUND OF THE INVENTION
Today, it is quite common to have complex computer systems with multiple computers connected through one or more networks. Typically, applications are distributed among the multiple computers and communicate using one of several industry standard distributed computing frameworks. In general, a distributed computing framework provides a specification for how objects interact and communicate with each other. The communication may occur within one process, between two different processes on one computer and across the network to processes running on different computers. These frameworks allows an inter-process and a network communication layer to be completely transparent to the application developer. Therefore, application developers may easily scale applications across multiple machines with various architectures and various operating systems. The distributed computing frameworks also facilitate inter-operability between software components created by different vendors by clearly defining interfaces for the software components.
Currently, the Distributed Component Object Model (DCOM) defined by the Microsoft Corporation, of Redmond, Wash., is one of the most popular distributed computing frameworks for enterprise applications. Typically, applications using DCOM reside on personal computers (PCs). In some enterprises, however, it may be desirable to extend the distributed applications to a variety of embedded systems, such as heating, ventilating, air conditioning (HVAC) controllers, data loggers, and programmable logic controllers (PLCs).
In some situations, it may be desirable for some DCOM applications residing on personal computers to operate in a mission critical environment, such as industrial automation and building automation. However, there are problems with using existing distributed computing frameworks for embedded systems and mission critical systems. For instance, both embedded systems and mission critical systems typically need higher reliability standards than the typical PC applications. These higher reliability standards require the systems to recover from errors or faults without affecting the operation of the system as a whole and also require the system to recover from errors without the intervention of a human technician.
Prior attempts at achieving high reliability for embedded systems and mission critical systems have focused on creating proprietary software for each different type of system. While the proprietary software solutions offer some fault tolerant characteristics, the proprietary software still has a disadvantage because the proprietary software must be modified for each different system.
Therefore, given the shortcomings associated with the prior art proprietary software solutions, there is a present need for a fault tolerant distributed computing framework that provides high reliability without requiring the software for each different system to be modified.
SUMMARY OF THE INVENTION
In accordance with the present invention, a system and method are provided for providing a fault tolerant distributed computing framework that allows the system to detect failures and to gracefully recover from the failures. In addition, the present invention allows the system to inter-operate with existing applications and objects that operate in an existing distributed computing framework, such as DCOM.
The fault tolerant system of the present invention provides inter-operability to applications and objects that operate in an existing distributed computing framework. The fault tolerant system includes a first layer including an application proxy operable to communicate with the applications as if the applications were communicating through the existing distributed computing framework and an object stub operable to communicate with the objects as if the objects were communicating through the existing distributed computing framework and a second layer including a fault detection mechanism communicating through the first layer to determine whether any one of a plurality of objects has experienced a failure. The fault tolerant system further includes a fault recovery mechanism for recovering from the failure detected by the fault detection mechanism.


REFERENCES:
patent: 5640564 (1997-06-01), Hamilton et al.
patent: 6018805 (2000-01-01), Ma et al.
patent: 6185695 (2001-02-01), Murphy et al.
patent: 6249821 (2001-06-01), Agatone et al.
patent: 6349342 (2002-02-01), Menges et al.
patent: 6370654 (2002-04-01), Law et al.
patent: 6438705 (2002-08-01), Chao et al.
patent: 6513112 (2003-01-01), Craig et al.
Wang et al. Reliability and Availability Issues in Distributed Component Object Model. IEEE. Sep. 11-12, 1997. Pp. 59-63.*
Microsoft Corporation. DCOM Technical Overview. Microsoft. Nov. 1996. Pp. 1-27.*
Horstmann et al. DCOM Architecture. Jul. 23, 1997. Pp. 1-44.*
Intrinsyc. deviceCOM: A functional extension of COM/DCOM for specialized distrubuted embedded Windows systems. Intrinsy Aug. 1999. Pp. 2-20.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for providing a fault tolerant distributed... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for providing a fault tolerant distributed..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for providing a fault tolerant distributed... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3086498

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.