Method and system for process state management using...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S015000

Reexamination Certificate

active

06185702

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a process state management method and a process state management system, and more particularly to a method and a system for managing process states using checkpoints in cases where one process is generated from another process.
2. Description of the Background Art
Conventionally, as a method for improving a reliability of program execution in a computer, the program execution method according to checkpoints has been known. This is a method in which states of processes that are executing entities of a program are acquired either regularly or irregularly according to prescribed checkpoint timings at a time of execution of the program, and the program is re-executed from the process states acquired at the nearest checkpoint when a trouble occurs during the program execution. Here, the checkpoint is defined as a time for carrying out the processing to acquire the process states when the program execution is viewed in a time sequence, and the checkpoint timing is defined as a time range from one checkpoint to a next checkpoint.
Now, in a system in which one process operates independently, it is sufficient to acquire the process states only at the checkpoints for intermediate states of that process, but in a case where a plurality of processes operate in relation such as that of inter-process communications, it is insufficient to acquire the process states for a single process alone according to the checkpoints. Namely, in order to prevent an occurrence of contradiction at a time of re-execution, there is a need to acquire process states for a plurality of processes which are mutually related at each checkpoint. In the following, for the sake of convenience, a checkpoint for each process is referred to as a local checkpoint, and a set of local checkpoints for mutually related processes is referred to as a distributed checkpoint.
As described, In a case where a plurality of processes operate in relation such as that of inter-process communications, it is necessary to acquire the process states of these plurality of mutually related processes consistently (without contradiction). This point will now be illustrated in further detail by referring to
FIGS. 1A
,
1
B and
1
C.
Namely,
FIGS. 1A
,
1
B and
1
C show examples of a distributed checkpoint. More specifically,
FIGS. 1A
,
1
B and
1
C show three types of distributed checkpoints CH
1
, CH
2
, and CH
3
in a case where a processing is carried out while each one of three processes p
1
, p
2
and p
3
carries out the message passing. In
FIGS. 1A
,
1
B, and
1
C, a symbol m indicates a message, and two numerals suffixed to this symbol m indicate a message transmission side process number and a message reception side process number respectively.
In
FIG. 1A
, at the distributed checkpoint CH
1
, there is no contradicting states for each message when the process states are acquired according to local checkpoints ch
11
, ch
12
and ch
13
, so that the message passing can be carried out correctly even when the processing is restarted by rolling back to the nearest checkpoint. However, in
FIG. 1B
, for a message m
32
at the distributed checkpoint CH
2
, despite of the fact that the process p
3
is still in a state of not transmitting this message at the local checkpoint ch
13
, the process p
2
is in a state of already receiving this message at the local checkpoint ch
12
. For this reason, when a trouble occurs in any one process and the processing is to be restarted by rolling back to the distributed checkpoint CH
1
, contradicting states regarding a message m
32
arise. Similarly, for the distributed checkpoint CH
3
of
FIG. 1C
, contradicting states regarding a message m
23
arise.
The conventionally proposed methods for guaranteeing the consistency of distributed checkpoints deal with the message passing, and include a synchronous checkpointing method and an asynchronous checkpointing method.
As a scheme for acquiring process states according to synchronous checkpointing, there is a scheme disclosed in K. Mani Chandy and L. Lamport: “Distributed Snapshots: Determining Global States of Distributed Systems”, ACM Trans. of Computer Systems, Vol. 3, No. 1, pp. 63-75 (February 1985). This scheme deals with the message passing as the inter-process communication, similarly as the examples described above, and defines the consistent distributed checkpoint as “a state without a message which is not yet transmitted and already received”. Here, a state without a message which is not yet transmitted and already received is a state where a message m
23
exists in a case of
FIG. 1B
described above.
Also, at CH
3
, m
23
will be lost so that such a message which is already transmitted and not yet received will be stored as acquired information. As a specific algorithm for this, process states are stored in such a manner that messages that cause contradictions are detected by exchanging messages called markers at a time of storing process states according to distributed checkpoints, and these messages are stored so as to be able to construct consistent states as a whole.
Also, in the general operating system, at a time of generating a new process, there are cases where a currently operating process newly generates its own copy. For example, in UNIX, the fork system call corresponds to this function by which a process with the same content as a process that called up this fork system call is generated. Here, a process that called up this fork system call is called a parent process, and a process newly generated from the parent process is called a child process.
FIG. 2
shows an exemplary checkpoint in a case of generating a new process in the synchronous checkpointing. In
FIG. 2
, a process A generates distributed checkpoints CP(n) and CP(n+1), and between these, the process A also generates a process B by using the fork system call. At this point, at CP(n+1), the process A is unrelated to the process B so that no checkpoint is generated for the process B. However, afterwards, the processes A and B come to have a relationship through messages m
1
and m
2
. Then, when a trouble (fault) F
1
occurs later on, the process A is going to be rolled back to CP(n+1) and restarted from there on, but the process B has no corresponding check point so that the process state has not been acquired for the process B and therefore it is impossible to restart the process B correctly.
Thus, in the synchronous checkpointing method for distributed checkpoints that deal with a plurality of processes, it is impossible to acquire process states consistently in a case where a new process is generated from some process, and for this reason, it is impossible to restart a newly generated process correctly in a case where a trouble occurs during the program execution and the restart is required.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a method and a system for process state management which are capable of acquiring process states consistently even in a case where a new process is generated from some process, while using the synchronous checkpointing method.
According to one aspect of the present invention there is provided a method for managing process states by acquiring process states of a group of processes which are mutually related, comprising the steps of: (a) prohibiting a new process generation during a process state acquisition; and (b) prohibiting a process state acquisition during a new process generation.
According to another aspect of the present invention there is provided a method for managing process states by acquiring process states of a group of processes which are mutually related, comprising the steps of: notifying from a first process an identifier of a second process and notifying from the second process the identifier of the second process, when the second process is generated from the first process; judging whether a notice of the identifier of the second process from the second process is prior

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for process state management using... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for process state management using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for process state management using... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2569251

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.