Electrical computers and digital processing systems: multicomput – Multicomputer synchronizing
Reexamination Certificate
2000-08-04
2004-09-28
Wiley, David (Department: 2144)
Electrical computers and digital processing systems: multicomput
Multicomputer synchronizing
C709S205000
Reexamination Certificate
active
06799222
ABSTRACT:
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 11-229355, filed Aug. 13, 1999, the entire contents of which are incorporated herein by reference.
BACKGROUND OF THE INVENTION
The present invention relates to a method for synchronizing program such as a deterministic program by using a reliable ordered multicast in a distributed computer system having a plurality of computers connected by a network, and also concerns such a distributed computer system and a computer as well as a storage medium.
First, an explanation will be given of a deterministic program, a reliable ordered multicast and a synchronizing process that are used in the present specification.
The deterministic program is explained as follows: As illustrated in
FIG. 1
, a deterministic program
12
is designed in such a manner that upon application of an input to a computer
10
, the output and the next status are determined depending on the status
11
of the computer
10
at that time. In other words, in the deterministic program
12
, once the output is determined, the next status
12
and the output are uniquely determined. More specifically, it refers to a program in which no reference is made to undefined values and random numbers. The concept of such a deterministic property has been widely used in the field of automaton.
As illustrated in
FIG. 2
, the characteristic of the deterministic program is that once the initial status and an input string have been determined, the operation is uniquely determined. Hereinafter, the deterministic program is referred to simply as the program.
Moreover, the reliable ordered multicast is explained as follows: In an environment such as a distributed computer system in which a plurality of computers are connected through a network, the respective computers are allowed to operate independently. Therefore, a special scheme is required so as to operate these computers in a synchronized manner. The reliable ordered multicast, which is one of such schemes, is a protocol which distributes data from each computer to all the computers, and which ensures that the order of arrivals of pieces of data is the same in all the computers.
Referring to
FIG. 3
, a specific example is given of the reliable ordered multicast. Data A, which was transmitted from a computer
10
-
2
at time t
20
, is received by all the computers
10
-
1
,
10
-
2
and
10
-
3
at times t
11
, t
21
and t
31
through an reliable ordered multicast, not shown. Data B, transmitted from a computer
10
-
3
at t
30
, is received by all the computers
10
-
1
,
10
-
2
and
10
-
3
at times t
12
, t
22
and t
32
. In this case, data A and data B are received by the respective computers
10
-
1
,
10
-
2
and
10
-
3
, and the reliable ordered multicast controls the system so that the order of receipts of these two data is the same in all the computers
10
-
1
,
10
-
2
and
10
-
3
.
Moreover, the synchronizing process is explained as follows: In the distributed computer system, there is a possibility that any of the computers might become out of order independently. Supposing that one disordered computer causes a malfunction in the entire system, the operating ratio of the distributed computer system becomes lower than the operating ratio of any one of the computers.
In order to prevent such a problem, it is necessary to multiplex processes that relate to the entire system. In contrast, the synchronizing process makes it possible to set the operating ratio of the distributed computer system higher than the operating ratio of any one of the computers. For example, in the case when a distributed computer system, constituted by 10 computers each having an operating ratio of 99%, is not multiplexed at all, the operation ratio of the distributed computer system is approximately 90%. Here, when multiplexed by two computers each having an operating ratio of 99%, the process has an operating ratio of approximately 99.99%.
Next, referring to
FIG. 4
, an explanation will be given of a synchronizing method by using the reliable ordered multicast. In this example, in a distributed computer system having computers
10
-
1
,
10
-
2
and
10
-
3
, a program execution is multiplexed by using the reliable ordered multicast.
As illustrated in
FIG. 4
, first, all the computers
10
-
1
,
10
-
2
and
10
-
3
are started with a predetermined initial status
11
in which, for example, all variables are set to zero. Data to be input is distributed to all the computers
10
-
1
,
10
-
2
and
10
-
3
always through a reliable ordered multicast
13
, and inputted to respective programs
12
. Here, one output from any one of the computers is taken as an output (in
FIG. 4
, computer
10
-
1
). The input string of each program is allowed to have the same order by the reliable ordered multicast
13
so that all the computers
10
-
1
,
10
-
2
and
10
-
3
are maintained in the same status
11
with their output strings being also the same because of the feature of the program. In other words, the execution of the program is multiplexed.
Next, an explanation will be given of the difference between a system in which synchronizing is made by the reliable ordered multicast and a system in which synchronizing is made by using a master/slave method. In other words, in the master/slave method, while a program is being executed on a master computer, each status is transferred to a slave computer periodically, and in the event of any fault of the master, switching is made to the execution of the program on the slave side; thus, a synchronizing process is achieved.
However, in the case of the master/slave method, back tracking occurs at the time of each taking over, with the result that the switching process at the time of any fault of the computer becomes complex, causing time-consuming tasks.
In contrast, in the case of the application of the reliable ordered multicast, no back tracking occurs at the time of any fault of the computer so that the switching process is simple and no time-consuming task is required.
Moreover, in the master/slave system, overhead is required for copying each status regularly; however, in the application of the reliable ordered multicast, no overhead is required.
In this manner, with respect to processes relating to reliability and performances of the entire system, it is preferable to use the reliable ordered multicast so as to carry out synchronizing.
The master/slave method, on the other hand, is suitable for cases in which a deterministic program is executed or in which executing a program on the slave side is not preferable.
The synchronizing method by the use of the reliable ordered multicast is based upon the premise that all the computers are operated from beginning. However, in an actual operation, there are cases in which a synchronizing process has to be started in the middle of the operation. For example, such cases include cases in which a computer which has been fault is recovered and in which a computer is newly added. In these cases, it is necessary to expanding the synchronizing process.
Referring to
FIG. 5
, an explanation will be given of a conventional method for expanding the synchronizing process by the use of the reliable ordered multicast. In
FIG. 5
, at Step 1, the reliable ordered multicast
13
is temporarily stopped. Next, at Step 2, the status
11
is copied on the computer
10
-
3
to be included. Next, at Step 3, the group of the reliable ordered multicast
13
is expanded and resumed.
In this method, supposing that a copy of any status is not appropriately carried out, the operation of the computer that has been included becomes different from the other computers. Of course, since the copy is made after the reliable ordered multicast has been temporarily stopped to maintain an invariable status, such an event will never occur in principle, and all the computers are allowed to start with the same operation when the reliable ordered multicast is resumed.
However, in the case when th
Finnegan Henderson Farabow Garrett & Dunner L.L.P.
Gerezgiher Yemane M.
Wiley David
LandOfFree
Method for synchronizing program applied to distributed... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for synchronizing program applied to distributed..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for synchronizing program applied to distributed... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3257234