Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-02-22
2003-03-25
Iqbal, Nadeem (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C711S112000
Reexamination Certificate
active
06539495
ABSTRACT:
TECHNICAL FIELD
This invention relates in general to the field of data processing and, in particular, to the duplexing of cache structures located within a coupling facility of a computing environment.
CROSS REFERENCE TO RELATED APPLICATIONS
This application contains subject matter which is related to the subject matter of the following patents/applications which are assigned to the same assignee as this application. Each of the below listed patents/applications is hereby incorporated herein by reference in its entirety:
“Castout Processing For Duplexed Cache Structures”, Elko et al., Ser. No. 09/255,383, filed herewith;
“Method And System For Reconfiguring A Storage Structure Within A Structure Processing Facility,” Allen et al., U.S. Pat. No. 5,515,499, Issued May 7, 1996;
“Multiple Processor System Having Software For Selecting Shared Cache Entries Of An Associated Castout Class For Transfer To A DASD With One I/O Operation,” Elko et al., U.S. Pat. No. 5,493,668, Issued on Feb. 20, 1996;
“Software Cache Management Of A Shared Electronic Store In a Supplex,” Elko et al., US. Pat. No. 5,457,793, Issued Oct. 10, 1995;
“Method, System And Program Products For Managing Changed Data Of Castout Classes,” Elko et al., Ser. No. 09/251,888, Filed: Feb. 19, 1999;
“Sysplex Shared Data Coherency Method,” Elko et al., U.S. Pat. No. 5,537,574, Issued Jul. 16, 1996;
“Method And Apparatus For Coupling Data Processing Systems” Elko, et al. U.S. Pat. No. 5,317,739, Issued May 31, 1994;
“In A Multiprocessing System Having A Coupling Facility, Communicating Messages Between The Processors And The Coupling Facility In Either A Synchronous Operation Or An Asynchronous Operation”, Elko et al., U.S. Pat. No. 5,561,809, Issued on Oct. 1, 1996;
“Mechanism For Receiving Messages At A Coupling Facility”, Elko et al., U.S. Pat. No. 5,706,432, Issued Jan. 6, 1998;
“Coupling Facility For Receiving Commands From Plurality Of Hosts For Activating Selected Connection Paths To I/O Devices And Maintaining Status Thereof”, Elko et al., U.S. Pat. No. 5,463,736, Issued Oct. 31, 1995;
“A Method And System For Managing Data and Users of Data in a Data Processing System,” Allen et al., U.S. Pat. No. 5,465,359, Issued on Nov. 7, 1995;
“Shared Access Serialization Featuring Second Process Lock Steal And Subsequent Write Access Denial To First Process” Insalaco et al, U.S. Pat. No. 5,305,448, Issued on Apr. 19, 1994;
“Method Of Managing Resources In One Or More Coupling Facilities Coupled To One Or More Operating Systems In One Or More Central Programming Complexes Using A Policy,” Allen et al., U.S. Pat. No. 5,634,072, Issued On May 27, 1997;
“Partial Page Write Detection For A Shared Cache Using A Bit Pattern Written At The Beginning And End Of Each Page”; Narang et al., U.S. Pat. No. 5,455,942, Issued Oct. 3, 1995;
“Method For Managing Database Recovery From Failure Of A Shared Store In a System Including A Plurality Of Transaction-Based Systems Of The Write-Ahead Logging Type”, Narang et al., U.S. Pat. No. 5,280,611, Issued Jan. 18, 1994; and “Method And Apparatus Of Distributed Locking For Shared Data, Employing A Central Coupling Facility”, U.S. Pat. No. 5,339,427, Issued Aug. 16, 1994.
BACKGROUND ART
A cache structure is a high-speed cache shared by one or more independently-operating computing units of a computing environment. In particular, cache structures are located within a remote facility, referred to as a coupling facility, that is coupled to the one or more independently-operating computing units. The computing units store and retrieve data from the cache structures.
Coupling facility cache structures can be configured in several different modes of operation, one of which is a store-in mode. Store-in mode caches are used, for example, by the DB2 database management facility of International Business Machines Corporation. A key attribute of the store-in mode is that changed data may be stored into the non-volatile memory of the coupling facility using the high performance coupling facility links. This avoids the delay in the execution of database transactions that result when the data is written to secondary storage (e.g., direct access storage devices (DASD)) using normal input/output (I/O) operations, and is an advantage of the coupling facility cache.
Subsystems who cache changed data in a coupling facility cache face a unique recovery/availability problem, which is not faced by those who either do not cache data or cache only unchanged data. For example, when a data item is modified and only written changed to the coupling facility cache structure, a subsequent failure of the coupling facility cache structure can cause the only existing current level of the data item to be lost. This results in a loss of data integrity. This loss of integrity window exists from the time the data item is written to the coupling facility cache until it is eventually castout to permanent storage, which may be a considerable time. At any given instant, a significant percentage of data stored in the coupling facility cache structure may be in this changed state, and thus vulnerable to loss should the coupling facility structure be lost.
To recover from such failures, subsystems have made use of recovery logs, which are hardened on permanent storage. Basically, during normal operation, as a given subsystem instance modifies a data item, it first writes a description of the data item update to its own recovery log along with a unique ordering indication (typically, a timestamp) showing when the update to the data was made relative to the other updates. Then, when the log update is complete, it writes the updated data item to the coupling facility cache structure. Given this, if the cache structure fails, a recovery process can reconstruct the most current version of the data by merging the recovery logs of all subsystem instances so that updates made by all instances can be observed; locating the most current copy of each data item in the log, using the ordering information associated with each of the logged updates; and writing the most current copy of each of the data items to permanent storage.
While the above approach allows the data to be recovered following the failure of a coupling facility cache structure, it is not an adequate solution for providing continuous availability of the shared data and of the coupling facility cache structure across such failures. The log merge and recovery update processing can take a long time, during which time the database is entirely unavailable for use by end users.
Thus, a need exists for a recovery technique that allows recovery from a failure with little or no perceived unavailability of the data to the end users. A further need exists for a mechanism that allows selected data to be duplexed. A yet further need exists for a mechanism that allows duplexing to be turned on and off automatically. A yet further need exists for a technique that enables a switch from duplex mode to simplex mode to be performed quickly.
SUMMARY OF THE INVENTION
The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a duplexing method. In one embodiment, the duplexing method includes writing data to a primary instance of a data structure; and selectively writing a portion of the data to a secondary instance of the data structure, wherein the secondary instance is usable as a copy of the primary instance, but contains less data than the primary instance.
The duplexing capability of the present invention advantageously provides for improved availability of data, such as cache structure data. Duplexing can be initiated on a per-structure basis, either manually or automatically. Once duplexing is initiated, the operating system drives the structure users to temporarily quiesce access to the structure; allocate a secondary structure instance in, for example, a different coupling facility from the primary structure instance; copy any necessary structure data from the primary instance to the secondary instance, establishing a duplexed copy of the structur
Elko David Arlen
Jones Steven Bruce
Josten Jeffrey W.
Narang Inderpal Singh
Nick Jeffrey M.
Heslin Rothenberg Farley & & Mesiti P.C.
International Business Machines - Corporation
Iqbal Nadeem
Kinnaman, Jr. Esq. William A.
Schiller, Esq. Blanche E.
LandOfFree
Method, system and program products for providing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method, system and program products for providing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system and program products for providing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3056885