Facilitating a restart operation within a data processing...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S015000, C714S023000, C707S793000

Reexamination Certificate

active

06754842

ABSTRACT:

FIELD OF THE INVENTION
The invention relates to the field of data processing and, more particularly, to a data processing system and method to allow a restart following a system failure.
BACKGROUND OF THE INVENTION
In the operation of a data processing system such as, for example, running IBM's OS/390™ operating system available from International Business Machines Corporation, one or more resource managers are provided to manage the resources of the data processing system. The resources may include, for example, both volatile and non-volatile storage, such as, online memory and direct access storage device (DASD) storage, as well as resource managers such as, for example, queue managers and data base managers, which perform insert, delete, increment and decrement operations. Conventionally such resource managers or systems are provided with a recovery log to store information needed to facilitate a restart of a resource manager in the event of a failure relating to the computer systems. It will be appreciated that such a failure may relate to a loss of power or the failure of a hardware device such as on board memory or a DASD holding a database.
U.S. Pat. No. 4,648,031 illustrates that it is known to write at specific operating points, a recovery log that is stored in non-volatile storage. Conventionally, the recovery log comprises a chronological record of processing events that have occurred within the data processing system and, typically, identify the units of work that have been undertaken by the data processing system. A Queue manager contains a recovery manager which is provided to co-ordinate a number of recovery operations which include the recovery of log records from the recovery log which are required for effecting a re-start.
Conventionally, a restart comprises a series of phases, which include a first phase commonly referred to as a status re-build phase. During the status rebuild phase, the status of incomplete units of work is established, a forward log range of the recovery log that must be traversed is established, a backward log range of the recovery log is also established together with a starting point for media recovery.
During a second phase, commonly known as a forward recovery phase, the recovery log is traversed forward from the starting point established during the status re-build phase to the tail end of the recovery log. During a third phase, conventionally known as a backward recovery phase, the recovery log is traversed backward to the starting point established in the status re-build phase from the tail end of the log.
During the forward and backward traversals, appropriate action is taken to render, for example, queues in a transaction consistent status, that is, the queues are recovered to a known condition. Any such action for a unit of work is known as a recovery process.
It will be appreciated that the lapsed time taken to effect a restart and the speed of restart processing is important to any business. For example, if the re-start of a database takes one hour, then that resource, which may be an insurance database, is not available for that hour and business cannot be conducted using the unavailable database.
In some circumstances the most significant restart variable in a transaction processing system is the time spent processing log information to provide transaction consistency and data integrity after a restart has been completed. Furthermore it will be appreciated that the introduction of old data files into a resource manager for a restart will require that these data files undergo media recovery operations, and incomplete units of work will need to be recovered or completed as part of the restart operation.
It will be appreciated that if one or more units of work during a restart operation are encountered that have been in progress for a relatively long period of time, such as, for example at least a day or two and, to take an even worse example, perhaps at least a week, the restart operation can result in the forward and backward recovery times being considerable.
For example, if it is discovered during a restart that there is a single incomplete unit of work that has been indoubt for two weeks, it can be appreciated that the restart process will take a considerable period of time, or, in the worst case, a restart using that pending unit of work may not be possible as the required log data may not be available. Conventionally, during the restart process, all log records relating to the indoubt unit of work would have to be read during forward recovery to lock the incomplete updates defined by the unit of work which prevents access to the data until the unit of work has been committed. If a unit of work is, as in this example, a number of weeks old, then prior log records for that unit of work may have been archived in off-line storage. The need to re-load and access such archived log records will further exacerbate restart time. Once the archived log records have been loaded, since they are typically stored on tape, the restart time may still take several hours since the log records must be read in a serial fashion.
If a single unit of work has been incomplete for two weeks and has a status of Inflight, again restart may take a considerable period of time, that is, restart may involve an extended backward recovery phase, or a restart may not be possible. During the restart process, all log records relating to the Inflight unit of work will have to be read during backward recovery to back out all of the updates defined by that unit of work. Again, as described above in relation to extended forward recovery times, there may be a need to retrieve old log records from an archive that is stored on magnetic tape.
It is an object of the present invention to mitigate at least some of the problems of the prior art.
SUMMARY OF THE INVENTION
Accordingly, a first aspect of the present invention provides a data processing method for a data processing system having a recovery log storing log records that can be used during recovery from a failure of the data processing system, the method comprising the steps of:
retrieving a unit of work from the recovery log;
determining whether or not the unit of work meets at least one predetermined criterion; and
removing the unit of work from the recovery log if the unit of work met the predetermined criterion.
Preferably, an embodiment is provided in which the predetermined criterion relates to the age of the unit of work.
Whether or not a unit of work is removed from a recovery log may depend upon that unit of work meeting a further criterion. Suitably, an embodiment provides a method further comprising the step of outputting a message relating to the unit of work requesting an indication of any preferred course of action for that unit of work; and receiving an input identifying the preferred course of action in relation to that unit of work.
It will be appreciated that the above step of outputting may output the message to a display device and solicit input from a user or message may be output to a message queue to solicit a response from an application.
Accordingly, a first aspect of the present invention provides a data processing method for facilitating a restart within a data processing system following a failure, the data processing system comprising, within persistent storage, a recovery log containing recovery log records which can be used during recovery from the failure of the data processing system, the log records relating to units of work undertaken by the data processing system, the method comprising the steps of:
retrieving, from the recovery log, a recovery log record relating to a unit of work;
determining whether or not the unit of work meets at least one predetermined criterion; and
performing a recovery process if the unit of work meets the predetermined criterion.
As recognised above, a significant problem associated with restart, that is, recovery from a failure, are units of work that have been incomplete or performing update activities that span a significant period of tim

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Facilitating a restart operation within a data processing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Facilitating a restart operation within a data processing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Facilitating a restart operation within a data processing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3321194

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.