Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-09-14
2001-02-27
Wright, Norman M. (Department: 2785)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S048000, C714S042000, C714S043000, C714S718000
Reexamination Certificate
active
06195767
ABSTRACT:
BACKGROUND
1. The Field of the Invention
This invention relates to the detection of corruption occurring in data written to storage media relying on a defective Floppy Diskette Controller (“FDC”), where an undetected data error causes data corruption and, more particularly, to novel systems and methods for inspection and warning to enable prompt restoration of data corrupted by defective FDCs.
2. The Background Art
Computers are now used to perform functions and maintain data critical to many organizations. Businesses use computers to maintain essential financial and other business data. Computers are also used by government to monitor, regulate, and even activate, national defense systems. Maintaining the integrity of the stored data is essential to the proper functioning of these computer systems, and data corruption can have serious (even life threatening) consequences.
Most of these computer systems include diskette drives for storing and retrieving data on floppy diskettes. For example, an employee of a large financial institution might have a personal computer that is attached to the main system. In order to avoid processing delays on the mainframe, the employee may routinely transfer data files from the host system to his local personal computer and then back again, temporarily storing data on a local floppy diskette. Similarly, an employee with a personal computer at home may occasionally decide to take work home, transporting data away from and back to the office on a floppy diskette.
Data transfer to and from a floppy diskette is controlled by a device called a Floppy Diskette Controller (“FDC”). The FDC is responsible for interfacing the computer's Central Processing Unit (“CPU”) with the physical diskette drive. Significantly, since the diskette is spinning, it is necessary for the FDC to provide data to the diskette drive at a specified data rate. Otherwise, the data will be written to the wrong location on the diskette.
The design of the FDC accounts for situations when the data rate is not adequate to support the rotating diskette. Whenever this situation occurs, the FDC aborts the operation and signals the CPU that a data underrun condition has occurred. Unfortunately, however, it has been found that a design flaw in many FDCs makes it impossible to detect all data underrun conditions. This flaw has, for example, been found in the NEC 765, INTEL 8272 and compatible Floppy Diskette Controllers. Specifically, data loss and/or data corruption can occur during data transfers to or from diskettes (or even tape drives and other media which employ the FDC), whenever the last data byte of a sector being transferred is delayed for more than a few microseconds. Furthermore, if the last byte of a sector write operation is delayed too long then the next (physically adjacent) sector of the diskette will be destroyed as well.
For example, it has been found that these FDCs cannot detect a data underrun on the last byte of a diskette read or write operation. Consequently, if the FDC is preempted during a data transfer to the diskette (thereby delaying the transfer), and an underrun occurs on the last byte of a sector, the following occurs: (1) the underrun flag does not get set, (2) the last byte written to the diskette is made equal to the previous byte written, and (3) Cyclic Redundancy Check (“CRC”) is generated on the altered data. The result is that incorrect data is written to the diskette and validated by the FDC.
Conditions under which this problem may occur can be identified by simply identifying those conditions that can delay data transfer to or from the diskette drive. In general, this requires that the computer system be engaged in “multi-tasking” operation or in overlapped input/output (“I/O”) operation. Multi-tasking is the ability of a computer operating system to simulate the concurrent execution of multiple tasks. Importantly, concurrent execution is only “simulated” because there is usually only one CPU in today's personal computers, and it can only process one task at a time. Therefore, a system interrupt is used to rapidly switch between the multiple tasks, giving the overall appearance of concurrent execution.
MS-DOS and PC-DOS, for example, are single-task operating systems. Therefore, one could argue that the problem described above would not occur. However, there are a number of standard MS-DOS and PC-DOS operating environments that simulate multi-tasking and are susceptible to the problem. The following environments, for example, have been found to be prime candidates for data loss and/or data corruption due to defective FDCs: local area networks, 327× host connections, high density diskettes, control print screen operations, terminate and stay resident (“TSR”) programs. The problem has also been found to occur as a result of virtually any interrupt service routine. Thus, unless the MS-DOS and PC-DOS operating systems disable all interrupts during diskette transfers, they are also susceptible to data loss and/or corruption.
The UNIX operating system is a multi-tasking operating system, and it is extremely simple to create a situation that can cause the problem within UNIX. One of the more simple examples is to begin a large transfer to the diskette and place that task in the background. After the transfer has begun then begin to process the contents of a very large file in a way that requires the use of a higher-priority Direct Memory Access (“DMA”) channel than the floppy diskette controller's DMA channel, i.e., video updates, multi-media activity, etc. Video access forces the video buffer memory refresh logic on DMA channel
1
, along with the video memory access, which preempts the FDC operations from occurring on DMA channel
2
(which is lower priority than DMA channel
1
). This type of example creates the classic overlapped I/O environment and can force the FDC into an undetectable error condition. More rigorous examples could include the concurrent transfer of data to or from a network or tape drive using a high priority DMA channel while the diskette transfer is active. Clearly, the number of possible error producing examples is infinite and very possible in this environment.
For all practical purposes the OS/2 and newer Windows operating systems can be regarded as UNIX derivatives. In other words, they suffer from the same problems that UNIX does. There are, however, two significant differences between these operating systems and UNIX. First, they both semaphore video updates with diskette operations in an effort to avoid forcing the FDC problem to occur. However, any direct access to the video buffer, in either real or protected mode, during a diskette transfer will bypass this safe-guard and result in the same condition as UNIX. Second, OS/2 incorporates a unique command that attempts to avoid the FDC problem by reading back every sector that is written to the floppy diskette in order to verify that the operation completed successfully. This command is an extension to the MODE command (MODE DSKT VER=ON). With these changes, data loss and/or data corruption should occur less frequently than before, but it is still possible for the FDC problem to destroy data that is not related to the current sector operation.
There are a host of other operating systems that are susceptible to the FDC problem just like DOS, Windows, Windows 95, Windows NT, OS/2, and UNIX. However, these systems may not have an installed base as large as DOS, Windows, OS/2 or UNIX, and there may, therefore, be little emphasis on addressing the problem. Significantly, as long as the operating systems utilize the FDC and service system interrupts, the problem can manifest itself. This can, of course, occur in computer systems which use virtually any operating system.
Some in the computer industry have suggested that the FDC problem is extremely rare and difficult to reproduce. This is similar to the argument presented during the 1994 defective INTEL Pentium scenario. Error rates for the defective Pentium ranged from microseconds to tens-of-thousand
Pate Pieree & Baird
Wright Norman M.
LandOfFree
Data corruption detection apparatus and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data corruption detection apparatus and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data corruption detection apparatus and method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2590980