Error detection/correction and fault detection/recovery – Pulse or data error handling – Data formatting to improve error detection correction...
Reexamination Certificate
1999-08-02
2003-04-29
Decady, Albert (Department: 2133)
Error detection/correction and fault detection/recovery
Pulse or data error handling
Data formatting to improve error detection correction...
C711S114000
Reexamination Certificate
active
06557123
ABSTRACT:
INCORPORATION BY REFERENCE
Incorporated by reference herein are Appendices A, B and C, which are submitted on a compact disc and contain computer program listings. The compact disc contains the following files:
Name of file: ApndxA.txt; date of creation: Nov. 4, 2002; size: 13 Kbytes;
Name of file: ApndxB.txt; date of creation: Nov. 15, 2002; size: 18 Kbytes; and
Name of file: ApndxC.txt; date of creation: Nov. 18, 2002; size: 22 Kbytes.”
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to data redundancy methods and apparatus. Various aspects relate more particularly to redundancy data generation, data restoration, data storage, redundancy adjustability, data communication, computer network operations, and code discovery techniques.
2. Description of the Related Art
With the explosive growth in the Internet and mission-critical applications, the importance of preserving data integrity and ensuring 24×7 continuous access to critical information cannot be overstated. Information is now recognized as a key organizational asset, essential to its operation and market competitiveness. Access to critical information on a continuous basis is a mandatory requirement for survival in the business world. Critical applications involving military operations, communications, audio-visual, medical diagnoses, ISP (Internet Service Provider) and Web sites, or financial activities, for example, depend upon the continuous availability of essential data.
Downtime is extremely costly. Customers, vendors, employees, and prospects can no longer conduct essential business or critical operations. There is a “lost opportunity” cost to storage failures as well in terms of business lost to competitors. Well-documented studies place the cost of downtime in the tens of thousands (or even millions) of dollars per hour.
The need for large amounts of reliable online storage is fueling demand for fault-tolerant technology. According to International Data Corporation, the 45 market for disk storage systems last year grew by 12 percent, topping $27 billion. More telling than that figure, however, is the growth in capacity being shipped, which grew 103 percent in 1998. Much of this explosive growth can be attributed to the space-eating demands of endeavors such as year 2000 testing, installation of data-heavy enterprise resource planning applications and the deployment of widespread Internet access.
Disk drive manufacturers publish Mean Time Between Failure (MTBF) figures as high as 800,000 hours (91 years). However, the claims are mostly unrealistic when examined. The actual practical life of a disk drive is 5 to 7 years of continuous use. Many Information Technology managers are aware that disk drives fail with great frequency. This is the most likely reason why companies place emphasis on periodic storage backup, and why there is such a large market for tape systems.
The industry answer to help satisfy these needs has been the use of conventional RAID (“Redundant Arrays of Inexpensive Disks”) storage. In general, RAID storage reduces the risk of data loss by either replicating critical information on separate disk drives, or spreading it over several drives with a means of reconstructing information if a single drive is lost.
There are basically four elements of RAID: 1) mirroring data (i.e., creating an exact copy every time information is written to storage), 2) performing checksum calculations (parity data), 3) striping information in equal-sized pieces across multiple drives, and 4) having a standby hot spare should one drive fail. Some methods use a combination of both approaches. RAID storage systems are usually designed with redundant power supplies and the ability to swap out failed drives, power supplies and fans while the system continues to operate. Sophisticated RAID systems even contain redundant controllers to share the workload and provide automatic fail-over capabilities should one malfunction.
Conventional RAID storage configurations have proven to be the best hedge against the possibility of a single drive failure within an array. If more than one drive in a RAID array fails, however, or a service person accidentally removes the wrong drive when attempting to re place a failed drive, the entire RAID storage system becomes inoperable. And the likelihood of multiple drive failures in large disk arrays is significant. The resultant cost of inaccessibility to mission-critical information can be devastating in terms of lost opportunity, lost productivity and lost customers.
Accidents can contribute to multiple drive failures in RAID storage. Service personnel have been known to remove the wrong drive during a replacement operation, crashing an entire RAID storage system. In poorly engineered RAID systems, replacing a failed drive can sometimes create a power glitch, damaging other drives. General data center administrative and service operations also present opportunities for personnel to inadvertently disable a drive.
It is well-known that the likelihood of a drive failure increases as more drives are added to a disk RAID storage system. The larger the RAID storage system (i.e., the more disk drives it has) the greater the chance that two or more drives could become inoperable at one time. Here, the term “time” means the duration from the instant when a drive fails until it is replaced and data parity information is recovered. In remote locations, during holidays, or even during graveyard shifts, the “time” to drive recovery could be several hours. Thus, multiple drive failures do not have to occur at exactly the same instant in order to have a devastating effect on mission-critical storage.
Given the plausible assumptions that drives fail independently at random times with a certain MTBF, and that they stay down a certain time after failing, the following conclusions may be drawn for large arrays of disks: (1) the frequency of single drive failure increases linearly as the number of disks n; (2) the frequency of two drives failing together (a second failing before the first is reconstructed) increases as n*(n−1), or almost as the square of the number of disks; (3) the frequency of three drives failing together increases as n(n−1)(n−2) or almost as the cube; and so forth.
The multiple failures, though still less frequent than single disk failure, become rapidly more important as the number of disks in a RAID becomes large. The following table illustrates the behavior of one, two and three drive failure MTBFs given that single drive MTBF divided by downtime is very much greater than the number of drives:
# of Drives
1
2
3
4
5
10
15
20
MTBF
a
a/2
a/3
a/4
a/5
a/10
a/15
a/20
MTB2F
—
b
b/3
b/6
b/10
b/45
b/105
b/190
MTB3F
—
—
c
c/4
c/10
c/120
c/455
c/1140
Here a <<b<<c are mean time constants for a failure of one disk, a coincidental failure of two disks, and a coincidental failure of three disks, respectively. If one-disk MTBF is five 360-day years and downtime is one day, then a=5 years, b=4,500 years, and c=5,400,000 years. If MTBF is reduced to 1 year and downtime increased to two days, then a=1 year, b=90 years, and c=10,800 years.
The consequences of a multiple-drive failure can be devastating.
Typically, if more than one drive fails, or a service person accidentally removes the wrong drive when attempting to replace a failed drive, the entire RAID storage system is out of commission. Access to critical information is not possible until the RAID system is re-configured, tested and a backup copy restored. Transactions and information written since the last backup may be lost forever.
Thus, the possibility of a multiple-drive failure is very high for mission-critical applications that run 24-hours daily on a continuous basis. Moreover, the larger a RAID storage system, the greater the potential of suffering multiple-drive failures. And the chances increase significantly for remote locations where the response time to replace a failed drive can extend to several hours or ev
Dickson Lawrence J.
Land Kris
Wiencko, Jr. Joseph A.
Abraham Esaw
De'cady Albert
Hogan & Harston LLP
Inostor Corporation
LandOfFree
Data redundancy methods and apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data redundancy methods and apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data redundancy methods and apparatus will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3099847