Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-12-09
2002-04-16
Black, Thomas (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C714S013000
Reexamination Certificate
active
06374264
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to the field of database management systems generally and, more particularly, to method and apparatus for detecting and recovering from data corruption of a database by codewording regions of the database and by logging information about reads of the database.
2. Description of the Related Arts
A database is a collection of data organized usefully and fundamental to many software applications. The database is associated with a database manager and together with its software application comprises a database management system (DBMS). In recent years, extensible database systems such as Illustra (now part of the Informix Universal Server) have been developed which allow the integration of application code with database system code. In these systems, the application code has direct access to the buffer cache and other internal structures of the DBMS. Similarly, application programs in many object oriented database (OODB) systems have direct access to an object cache in their address space. This OODB architecture was developed to minimize the cost of accesses to data, for example, to support the needs of Computer Aided Design (CAD) systems. Finally, several recently developed storage management systems provide memory resident or memory mapped architectures. For example, the Dali main-memory storage manager described in Bohannon et al., “The Architecture of the Dali Main-Memory Storage Manager,”
Multimedia Tools and Applications
, 4, 115-151 (1997) is designed to provide applications with fast, direct access to data by keeping the entire database in volatile main memory. In all these systems, direct access to data (either in the database buffer cache or in a memory-mapped portion of the database) by application programs is critical to providing fast response times. The alternative to memory mapping is to access data via a server process, but this presents an unacceptable solution due to the high cost of inter-process communication. Application code is typically less trustworthy than database system code, and there is therefore a significant risk that “wild writes” and other programming errors can affect persistent data in systems that allow applications to access such data directly. Since the systems described above are increasingly popular, the risk of wild writes is growing. Additionally, there is a risk of damage due to software faults in the DBMS itself. It is therefore important to develop techniques that can mitigate the risk of corruption.
In our parent U.S. patent application Ser. No. 08/766,096, filed Dec. 16, 1996, now U.S. Pat. No. 5,845,292 and entitled “System and Method for Restoring a Distributed Checkpointed Database,” we describe the application of multiple checkpoints and the maintenance of a stable log record stored on a server for tracking operations to be made to the multiple checkpoints in a distributed environment. A companion parent application, U.S. patent application Ser. No. 08/767,048, now U.S. Pat. No. 5,845,849, entitled “System and Method for Restoring a Multiple Checkpointed Database in View of Loss of Volatile Memory” filed the same day describes recovery processes at multiple levels of a DBMS in the event of loss of volatile memory. The '048 and the '096 applications should be deemed to be incorporated by reference herein as to their entire contents. Both of these applications relate to the preservation and restoration of a database (or distributed database), for example, stored in main volatile memory of a data processor.
The problem of detecting and recovering from corruption of data in a database system still remains to be solved in a pragmatic manner without adding considerable overhead to the DBMS. Data corruption may be physical or logical and it may be direct or indirect. Data is “directly” corrupted by “unintended” updates, such as wild writes as explained above due to programming errors in the physical case, or arising from incorrectly coded updates or input errors (human errors) in the logical case. Once data is directly corrupted, it may be read by a process, which then issues writes based on the value read. Data written in this manner is indirectly corrupted, and the process involved is said to have carried the corruption. While this process may be a database maintenance process, we focus on transaction-carried corruption, a problem in which the carrying process is executing transactions.
Direct physical corruption can be mostly prevented with hardware memory protection, using the virtual memory support provided by most operating system. One approach involves mapping the entire database in a protected mode, and selectively un-protecting and re-protecting pages as they are updated. However, this can be very expensive, for example, on standard UNIX systems. An alternative to the hardware approach would be programming language techniques such as type-safe languages or sandboxing. (Sandboxing is a technique whereby an assembly language programmer adds code immediately before a write to ensure that the instruction is not affecting protected space.) However, type-safe languages have yet to be proven in high-performance situations, and sandboxing may perform poorly on certain architectures. Finally, communication across process domain boundaries to a database server process provides protection, but such communication is orders of magnitude slower than access in the same process space, even with highly tuned implementations. The concern over physical corruption is further motivated by the increasing number of systems in which application code has direct access to system buffers, including extensible systems, object databases, and memory-mapped or in-memory architectures. Finally, some work has raised concern over damage to data due to faults in the DBMS itself.
Integrity constraints are widely studied and prevent certain cases of logical corruption in which rules about the data would be violated. However, it is an object of the present invention to deal with those cases in which integrity constraints and other input validation techniques fail, and whether due to programming error or invalid input, unintended updates are made to the database. We consider such cases inherently impossible to prevent, and instead assume that the problem is detected later, usually when a database user notices incorrect output (on a bank statement, for example).
Thus, there appears a genuine need in the art of database management systems to provide an improved method and apparatus for detecting and recovering from corruption of a database.
SUMMARY OF THE INVENTION
According to the present invention, it is a principle to apply several new techniques for the prevention or detection of corruption. In particular, a Read Prechecking scheme associates one word codewords with each region of data, and prevents transaction-carried corruption by verifying that the codeword matches the data each time it is read. A Data Codeword scheme, a less expensive variant of Read Prechecking, allows detection of direct physical corruption by asynchronously auditing the codewords. This scheme is also referred to herein as deferred codeword maintenance and involves performing codeword updates during a process called “log flushing” at the same time as data is flushed to disc from main memory.
For detecting indirect logical or physical corruption, it is a feature of the present invention to log information about reads (Read Logging). Interestingly, any negative impact of Read Logging is limited, as the actual values read are not logged according to one embodiment of the present invention, just the identity of the item read and optionally a checksum of the value. Moreover, it is an extension of the present invention to apply codewords as well to protect the read log records.
When corruption is detected rather than prevented, techniques for corruption recovery are employed to restore the database to an uncorrupted state. As will be further described herein, once codewording of data and read logging is perform
Bohannon Philip L.
Rastogi Rajeev
Seshadri Srinivasan
Silberschatz Abraham
Sudarshan Sundararajarao
Black Thomas
Chen Te Yu
Lucent Technologies - Inc.
LandOfFree
Method and apparatus for detecting and recovering from data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for detecting and recovering from data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for detecting and recovering from data... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2888904