Electrical computers and digital processing systems: memory – Storage accessing and control – Specific memory composition
Reexamination Certificate
2000-08-30
2003-12-23
Sparks, Donald (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Specific memory composition
C711S111000, C711S114000, C709S231000, C709S238000
Reexamination Certificate
active
06668304
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to data transactions, and specifically to distributed transaction support of data written to a non-volatile memory.
BACKGROUND OF THE INVENTION
A transaction is a sequence of information exchange and related work (such as database updating) that is treated as a unit of atomicity for ensuring data integrity. In a transaction, data is transformed from one consistent state to another. For a transaction to be completed and data changes to be made permanent (or “committed”), a transaction has to be made atomic with respect to failure, i.e., it has to be completed in its entirety. If something happens before the transaction is successfully completed, the transaction is aborted, and any changes to the data must be undone, so that the effect is as if the transaction never existed. Hereinbelow, the term “complete” when applied to a transaction meansi that the transaction is either committed or aborted.
The inherent difficulty of transaction support is exacerbated when participants in the transaction are part of a distributed system. It is then necessary to ensure that the transaction is committed or that it is aborted atomically and consistently by all of the participants. For example, some of the participants in a transaction may fail, and it is possible in a distributed system that some of the other participants may not know of the failure. Also, participants who have recovered after a failure must determine;the fate of the transaction.
In Concurrency Control and Recovery in Database Systems, by Bernstein et al. (Addison-Wesley, 1987), which is incorporated herein by reference, a description is given in chapter 7 of atomic commitment protocols (ACPs) which ensure transaction consistency over multiple sites of a distributed system. The authors describe a two-phase-commit (2PC) protocol as an example of an ACP. The 2PC protocol comprises a first phase wherein all participants of a transaction are polled as to whether the transaction should be committed or aborted. In a second phase of the 2PC protocol a coordinator of the transaction decides, on the basis of the poll, if the transaction is to be committed or aborted, and transmits that decision to the participants.
Methods for efficiently storing data, and recovering the stored data in the event of a computer system failure, are known in the art. The methods rely on storing information additional to the data to a non-volatile memory, typically a disk, and using the additional information to recover the stored data when the failure occurs.
U.S. Pat. No. 5,345,575 to English et al., whose disclosure is incorporated herein by reference, describes a disk controller comprising a memory. The memory contains a table mapping logical addresses of data blocks stored on a disk to labels identifying physical storage locations. In addition, to writing the data to a storage location, the disk controller writes the associated logical address of each storage location, a time stamp, and data indicating where in a sequence of data blocks a specific data block occurs. The additional information is used to recover from system failures by reading from substantially the whole disk.
U.S. Pat. No. 5,481,694 to Chao et al., whose disclosure is incorporated herein by reference, describes an electronic data storage system comprising a memory, a plurality of magnetic disk units, and a controller. The memory comprises a table cross-referencing logical addresses with physical addresses on the disk units, a list of physical addresses containing obsolete data, and a list of physical addresses for segments on the disk units which are able to receive data. When data are written to the disk units, a tag comprising the logical address and a sequence number for multiblock writes is written with the data. To recover from a system failure, a checkpoint log and checkpoint segments stored on the disk units recover the table and lists.
In an article by de Jonge et al., “The Logical Disk: A New Approach to Improving File Systems,” in Proceedings of the 14th
Symposium on Operating Systems Principles
, pp. 15-28 (December 1993), which is incorporated herein by reference, the authors describe a logical disk wherein an interface is defined for disk storage which separates file management and disk management. The interface uses logical block numbers and block lists, and supports multiple file systems. The authors claim to support an Atomic Recovery Unit (ARU). During recovery all logical disk commands belonging to the same ARU are treated as a single invisible operation. Thus, the logical disk will always recover to either a state that existed before, or to a state that existed after performing all operations of an ARU. However, concurrent ARUs are not supported.
In an article by English et al., “Loge: a self-organizing disk controller,” in Proceedings of the USENIX Winter 1992 Technical Conference, pp. 237-251 (January 1992), which is incorporated herein by reference, the authors describe a system for storing data to a disk using a translation table and an allocation map. A trailer tag comprising a block address and a time stamp is written to the disk together with the stored data. The information in the trailer tag enables the system to recover from a failure.
In an article by Chao et al., “Mime: a high performance parallel storage device with strong recovery guarantees,” HPL-CSP-92-9 (published by the Hewlett-Packard Company, November 1992), which is incorporated herein by reference, the authors describe a disk storage architecture similar to that of Loge, as described above. In Mime, the trailer tag comprises a block address, a sequence number for multiblock writes, and a last-packet-in-multiblock-write flag. As in Loge, the trailer tag information enables the system to recover from a failure.
Mime supports atomic multi-block writes with a limited form of transaction support in the form of a visibility group. Mime guarantees that in the case of a failure all block writes within an active visibility group are aborted.
SUMMARY OF THE INVENTION
It is an object of some aspects of the present invention to provide an improved system for supporting data transactions.
It is a further object of some aspects of the present invention to; provide an improved system for performing concurrent data transactions when participants in the transaction are distributed over a network.
In preferred embodiments of the present invention, one or more storage devices, preferably non-volatile disks, are used for storing data contents of transactions initiated by one or more clients of the storage devices. Each of the storage devices is managed by control circuitry, preferably a storage server, which writes the data contents of the transactions to selected block-frames of the storage device. Such storage devices are herein termed transaction supporting logical disks (TSLDS). The storage servers have volatile memory in which they hold data structures whose values are used, inter alia, to track transaction data written to the TSLDs and to link dynamically the physical and logical addresses of the block-frames to which the data are written.
In order to safeguard a TSLD against failure of a specific storage server, values in the data structures of the server are stored to that TSLD at periodic intervals, using checkpoint operations. Between checkpoint operations, values in the data structures are also stored together with the data contents of the transactions in the block-frames of each TSLD. Preferably, values in the data structures, and the checkpoint data, enable block-frames to be conveniently found in the event of a storage server failure, so that the data contents within the block-frames can be recovered. In the event of a failure, the storage server reads the stored checkpoint data and “replays” the process of TSLD operations, including committing and aborting transactions, since the last checkpoint was performed. The replaying process enables the storage server to recover its state and the state of any ongoing transactions at the time of
Gold Israel
Satran Julian
Sheinwald Dafna
Darby & Darby
Peugh Brian R.
Sparks Donald
LandOfFree
Transaction support on logical disks does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Transaction support on logical disks, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Transaction support on logical disks will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3172561