Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-12-09
2002-09-10
Trammell, James P. (Department: 2161)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06449623
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to the field of database management systems generally and, more particularly, to method and apparatus for detecting and recovering from data corruption of a database by codewording regions of the database and by logging information about reads of the database.
2. Description of the Related Arts
A database is a collection of data organized usefully and fundamental to many software applications. The database is associated with a database manager and together with its software application comprises a database management system (DBMS). In recent years, extensible database systems such as illustrate (now part of the Informix Universal Server) have been developed which allow the integration of application code with database system code. In these systems, the application code has direct access to the buffer cache and other internal structures of the DBMS. Similarly, application programs in many object oriented database (OODB) systems have direct access to an object cache in their address space. This OODB architecture was developed to minimize the cost of accesses to data, for example, to support the needs of Computer Aided Design (CAD) systems. Also, several recently developed storage management systems provide memory resident or memory mapped architectures. For example, the Dali main-memory storage manager described in Bohannon et al., “The Architecture of the Dali Main-Memory Storage Manager,”
Journal of Multimedia Tools and Applications
, 4(2), pp. 115-151 (1997) is designed to provide applications with fast, direct access to data by keeping the entire database in volatile main memory. In all these systems, direct access to data (either in the database buffer cache or in a memory-mapped portion of the database) by application programs is critical to providing fast response times. The alternative to memory mapping is to access data via a server process, but this presents an unacceptable solution due to the high cost of inter-process communication. Application code is typically less trustworthy than database system code, and there is therefore a significant risk that “wild writes” and other programming errors can affect persistent data in systems that allow applications to access such data directly. Since the systems described above are increasingly popular, the risk of wild writes and associated physical corruption is growing. Additionally, there is a risk of damage due to software faults in the DBMS itself. It is therefore important to develop techniques that can mitigate the risk of corruption.
In our parent U.S. patent application Ser. No. 08/66,096, filed Dec. 16, 1996 and entitled “System and Method for Restoring a Distributed Checkpointed Database,” we describe the application of multiple checkpoints and the maintenance of a stable log record stored on a server for tracking operations to be made to the multiple checkpoints in a distributed environment. A companion parent application, U.S. patent application Ser. No. 08/67,048, entitled “System and Method for Restoring a Multiple Checkpointed Database in View of Loss of Volatile Memory” filed the same day describes recovery processes at multiple levels of a DBMS in the event of loss of volatile memory. The '048 and the '096 applications should be deemed to be incorporated by reference herein as to their entire contents. Both of these applications relate to the preservation and restoration of a database (or distributed database), for example, stored in main volatile memory of a data processor.
The problem of detecting and recovering from corruption of data in a database system still remains to be solved in a pragmatic manner without adding considerable overhead to the DBMS. Data corruption may be physical or logical and it may be direct or indirect. Data is “directly” corrupted in a physical corruption sense by “unintended” updates, such as wild writes as explained above due to programming errors in the physical case, or arising from incorrectly coded updates or input errors (human errors) in the logical case. Once data is directly corrupted, it may be read by a process, which then issues writes based on the value read. Data written in this manner is indirectly corrupted, and the process involved is said to have carried the corruption. While this process may be a database maintenance process, we focus on transaction-carried corruption, a problem in which the carrying process is executing transactions.
Direct physical corruption can be mostly prevented with hardware memory protection, using the virtual memory support provided by most operating systems. One approach involves mapping the entire database in a protected mode, and selectively un-protecting and re-protecting pages as they are updated. However, this can be very expensive, for example, on standard UNIX systems. An alternative to the hardware approach would be programming language techniques such as type-safe languages or sandboxing. (Sandboxing is a technique whereby an assembly language programmer adds code immediately before a write to ensure that the instruction is not affecting protected space.) However, type-safe languages have yet to be proven in high-performance situations, and sandboxing may perform poorly on certain architectures. Finally, communication across process domain boundaries to a database server process provides protection, but such communication is orders of magnitude slower than access in the same process space, even with highly tuned implementations. The concern over physical corruption is further motivated by the increasing number of systems in which application code has direct access to system buffers, including extensible systems, object databases, and memory-mapped or in-memory architectures. Finally, some work has raised concern over damage to data due to faults in the DBMS itself.
Integrity constraints are widely studied and prevent certain cases of logical corruption in which rules about the data would be violated. However, it is an object of the present invention to deal with those cases in which integrity constraints and other input validation techniques fail, and whether due to programming error or invalid input, unintended updates are made to the database. We consider such cases inherently impossible to prevent, and instead assume that the problem is detected later, usually when a database user notices incorrect output (on a bank statement, for example).
In the field of accounting systems and audits, it is known from L. A. Bjork, Jr., “Generalized Audit Trail Requirements and Concepts for Data Base Applications”,
IBM Systems Journal
, No. 3, 1975, pp. 229-245, how to create and maintain an audit trail—a history of activities by transaction, posted because of operations on specific data. Bjork describes that a time dimension can be added to a stored record such that supplemental information in the form of a descriptor and time frame are maintained. For example, the time frame when the information was created and stored, and each version of a data field, is maintained along with the action. He refers to “create” (creation of data), “reference” (when reference is made to x at time t), and update (when created data is updated) as descriptors all having time dimension. A further descriptor is “refer to prior generation” (when data now updated is referred to by a prior generation.” Also, C. T. Davies, Jr. in his article “Data Processing Spheres of Control” appearing in the
IBM Systems Journal
, Vol. 17, No. 2, 1978, pp. 179-198 describes “in-process recovery” as the control of recording and subsequent use of data required to return to a previous point in a process, and that the process that created or last modified data elements be determined from a journal. System recovery is obtained via establishing checkpoints that represent an early state in a data base. Once a search backward is conducted to find an error, a checkpoint behind the error is obtained from which to rebuild. These accounting processes, for example, typified by the recovery from a payro
Bohannon Philip L.
Rastogi Rajeev
Seshadri Srinivasan
Silberschatz Abraham
Sudarshan Sundararajarao
Lucent Technologies Inc
Morgan & Finnegan
Trammell James P.
Wang Mary
LandOfFree
Method and apparatus for detecting and recovering from data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for detecting and recovering from data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for detecting and recovering from data... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2845144