Method and apparatus for database fault tolerance with...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06421688

ABSTRACT:

BACKGROUND
The present invention relates to the field of high performance fault tolerant database systems using off-the-shelf database servers. More particularly, this invention provides non-intrusive non-stop database services for computer applications employing modern database servers.
Data service is an essential service for electronic commerce and information service applications. Among all different data service technologies, such as flat files, indexed files, multiple linked files or databases, database is most preferred. Database servers are complex software systems providing efficient means for data storage, processing and retrieval for multiple concurrent users. Typically, a database server is implemented on top of an existing operating system—a lower-level software system providing management facilities for efficient use of hardware computing and communication components. The correct execution of a database server relies on the correct execution of the operating system tasks storage systems, computer servers, networking equipment and operating systems. Therefore, failures of any one component can affect the normal operation of database service.
In general, the degree of fault tolerance of a database service depends on how quickly and reliability a database service can detect and recover from a failure or fault. The capability to fault tolerance can be evaluated in two ways: 1) ability to detect a database service failure; and 2) ability to mask a detected service failure in real time. Database service failure can be either hardware failure or software failure. In view of the tight binding among hardware, operating system and database server, it is often difficult to pin point where a failure started. Hardware related fault tolerance technologies use hardware redundancy to prevent component malfunctions. For example, redundant array of inexpensive disks (RAID) technology is well known and functions by replicating hardware disks and distributing replicating data over several disks, using disk mirroring and striping at the hardware level to increase the availability of data and to prevent data storage failures. Hardware related failures have become minimal due to the dramatic increase in the reliability of hardware components and the dramatic decrease in costs. Recent statistics show that software related failures account for the majority of database service failures. Hardware replication technologies can resolve hardware related problems, but cannot provide fault resilience to the operating system and database system malfunctions. Usually commercial database applications do not directly interface with hardware, but interface indirectly through the operating system. In general, database malfunctions of commercial database applications are not detected by hardware that will continuously operate irrespective of a database malfunction. There are few hardware-aware multi-processor database systems available that can detect hardware and database malfunctions directly, but they are prohibitively costly. They require extensive changes to database server software. They lack the continuous operation capability due to their inability to repair components while allowing database accesses. They also lack the continuous capability for it is impossible to pause and update one processor while letting users accessing the data through another processor. Examples of these systems include Compaq's Nonstop-SQL and Stratus Computer's proprietary database systems.
In a multi-users multiple redundant database servers environment, race conditions are major problems that cause database lockups and data content inconsistency in the multiple database servers. The ability to run multiple independent redundant database servers concurrently without a race condition is referred to herein as instant transaction replication. There are several database fault tolerance approaches to resolve such race conditions.
One method provides non-concurrent transaction replication using high speed replicated servers. U.S. Pat. No. 5,812,751 to Ekrot, et al. discloses a primary server and a monitoring standby server connecting to the same storage system. Once the primary system fails, the standby server will automatically take over the store system and assumes the operation previously performed by the failed primary server. This method also requires extensive changes to a database server software. More importantly, this method allows the data storage system to be the single point to failure. Any software malfunction that leaves any mark on the data storage contents will lead to an unrecoverable database system. A problem with the non-concurrent replication method is that the single point of failure is the shared storage system. As the user access frequency increases, multiple transactions can be lost in the event of the primary database server crash. This method also does not allow “on-line repair” or continuous operation of the database.
U.S. Pat. No. 5,745,753 to Mosher, Jr. discloses another non-concurrent remote data duplicate facility (RDF) that maintains virtual synchronization of the backup database with the local database. The RDF includes an extractor process executed by the local computer system, a receiver process and a plurality of update processes by the remote system. The extractor process sends sequential numbered audit trail records to the remote receiver process while the application performs online database restructuring in the local computer. The update process of the remote system, based on the incoming audit trail records, performs the same operation on the backup database. The traction manager stores a stop update audit record in the local audit trail where each online database restructuring successfully completes. Both local and remote processes use these sequential numbered audit records to cross check if any database operation is out of order. Acknowledge and redo commands will pass back and forth between the RDF and local databases. A crash of the local computer system can result in incomplete transaction logs for all replication servers rendering the entire system unrecoverable.
A conventional software method for computer system fault tolerance is file system replication or disk mirroring. A file system is the basic building block of any operating system for storage management, such as hard disks. A replicated file system uses a special storage device driver that maintains the identical appearance to user programs so they “think” the storage device remains the same while it duplicates each disk I/O request to both the local file system as well as remote backup file system. This method is highly transparent to users since it does not require any modification to existing application programs. In the event of the primary system failure, the replicated file system can be used to restart all services. U.S. Pat. No. 5,764,903 to Yu discloses a virtual disk driver used between a primary server and a secondary server which are mirroring over a network. That system uses a disk write request, also handles the control to operating system which in turn invokes the virtual disk driver. The virtual disk driver monitors both primary and secondary disk write operation, the control does not return to the calling application until the disk write is committed to both the primary and secondary disks. This method can protect most desktop applications, such as word processors, spreadsheets and graphics editors, from failure of the most of hardware fault and the operating system error. However it cannot protect the users from database server crashes, because the timing of database server command executor is not maintained.
U.S. Pat. No. 5,781,910 to Gostanian, et al. discloses distributed transaction database replicas. A single database manager employs an atomic commit protocol such as 2 phase commit (2PC) protocol. The basic idea behind 2PC is to determine a unique decision for all replicas with respect to either committing or aborting a transaction and then executing that decision at all replicas. I

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for database fault tolerance with... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for database fault tolerance with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for database fault tolerance with... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2896577

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.