Method and apparatus for implementing a highly efficient,...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06560615

ABSTRACT:

FIELD OF THE INVENTION
This invention pertains to backup data and more particularly to a technique to speed up backup operations.
BACKGROUND OF THE INVENTION
The task of quickly and efficiently backing up modern file storage systems is becoming more and more difficult, especially given the relentless increases in storage capacities along stratospheric growth curves. Storage hardware manufacturers now routinely produce compact, affordable rack-mounted systems sporting hundreds of gigabytes or even several terabytes. Software products make it equally routine to wrap the entire contents of such a system inside a single volume if so desired, supporting literally billions of individual files while maintaining high availability and failure recovery times for such a volume.
Backup technology has not even come close to keeping up with this explosive growth in storage. In particular, all major backup vendors still basically rely on brute-force searches to discover which files have changed since the last time they were archived. These searches generally result in a huge amount of wasted time and effort. Statistically speaking, only 20% of a system's files will likely have changed on any given day. What is worse, each and every file's metadata (data about the file) block must be read into memory and examined to see if it needs backup. Not only does this require massive numbers of I/O and processor cycles that could otherwise go towards servicing file system requests from users, but about 80% of this effort will be a complete waste of time.
An even bigger problem than the massive inefficiency described above is time. More and more organizations are discovering that they can no longer back up changes made to their data in a 24-hour period—there are too many files to search through, and too few hours in the day.
Normally, a backup agent performs a tree walk over the complete set of files in a volume. Each file thus encountered whose data or metadata bits (also called the file's archive bits) were turned on is locked and its contents stored to tape. Once safely on tape, the backup agent turns off the file's archive bits, unlocks it, and continues its tree walk.
As system administrators and users have discovered, this “normal” approach to incremental backups has critical problems. The worst of these is that the time required to walk a volume's file tree is proportional to the number of files present. This number can easily be in the billions in modem systems, and even the fastest processors and disk I/O systems will not be able to inspect all of a volume's files in any given 24-hour period. As if that is not enough, arbitrarily long “quiet” periods are possible during which the backup agent encounters nothing but unmodified files. The tape backup system cannot be kept busy during these potentially numerous and extended periods of time.
In fact, shoe-shining can occur during these quiet times due to the physical characteristics of tape drives. When the system runs out of data to write, the tape must be slowed and stopped, then rewound to a point significantly before the point of last write so that the tape can be brought up to streaming speed upon (eventual) receipt of the next write buffer. This back-and-forth motion over the tape heads reminds people of how shoes are buffed and polished. Shoe-shining only serves to wear down the tape head, strip oxide from the medium, and significantly reduce the overall backup throughput.
One alternative to the “normal” approach is to utilize a Change Journal, as described for the Microsoft® Windows® 2000 operating system. (“Microsoft” and “Windows” are registered trademarks of Microsoft Corporation in the United States and/or other countries.) In the article “Keeping an Eye on Your NTFS Drives: The Windows 2000 Change Journal Explained,” published in the Microsoft Systems Journal, September 1999, Jeffrey Cooperstein and Jeffrey Richter say that the Windows® 2000 Change Journal is “ . . . a database that contains a list of every change made to the files or directories on an NTFS 5.0 volume. Each volume has its own Change Journal database that contains records reflecting the changes occurring to that volume's files and directories.”
The Change Journal is implemented as a single, system-protected-and-maintained, fixed-maximum-size sparse file. Each time a file is changed in some way, an entry is appended to this special file. Change Journal entries include a 64-bit Update Sequence Number (USN), the file's character string name, the time of the change, and the type of change that was made. Entries cannot span file blocks (typically 4K bytes), so some wasted space is possible per block. Entries are supposed to average about 100 bytes, but can be significantly larger if changed files have long pathnames. There may be multiple entries for any given file, as each change to the file is appended to the Change Journal as a separate record. Each change to a file requires not only that a distinct entry be added to the Change Journal, but that the file's entry in the volume's Master File Table (MFT) be persistently updated with that new entry's USN. Change Journals are disabled by default on Windows® NT 5.0 volumes. All applications have equal access to a volume's Change Journal, and any one of them may at any time enable or disable it. All records in the Change Journal are deleted each and every time it is disabled.
The Change Journal has several limitations. First, it is not guaranteed to be accurate (or even available) at any given point in time. Since it can be disabled at any time by any application (causing all its records to be purged), it cannot be relied upon for mission-critical applications such as backup. Second, enumerating all changed files will require a full scan through the Change Journal in which every changed file may contribute large numbers of entries. If only some of the entries in the Change Journal are to be used to back up files, processing time and memory must be wasted skipping over the irrelevant entries. Third, with a (conservative) estimate of 100 bytes per entry, memory and persistent storage overhead will be high. This problem is compounded by the fact that a single file may generate multiple entries, further lengthening the Change Journal. Fourth, each and every addition of a Change Journal record for a file will require that file's entry in the Master File Table (MFT) be atomically and persistently updated (i.e., updated as a single transaction and retained even if the system should fail). Requiring atomic transactions should be avoided as much as possible, and the Change Journal requires an atomic transaction for each entry, regardless of the number of entries generated by a file. Finally, the Change Journal's representation of file changes requires a large amount of memory.
U.S. Pat. No. 5,684,991 to Malcolm, issued Nov. 4, 1997, titled “Modification Metadata Set, Abstracted from Database Write Requests,” describes another approach to speed up backup operations. According to Malcolm, whenever a write command is issued to write data to storage, a data set is added to a database identifying the subset of the file that was written. Where multiple data sets relate to the same area of a file, all but the most recent can be discarded. Where multiple data sets relate to contiguous areas of a file, they can be merged into a single data set. The database can then be used to simplify backup operations.
But the focus of Malcolm is to speed backup times by backing up only those parts of a file that have changed since the last backup operation. Malcolm may speed up backup operations, but recovering an archived file will generally be slower. To recreate a file from a tape, each separate archive operation must be scanned to determine whether any portion of the file is saved on that tape. Conceivably, recreating a file could require reading a segment of each tape. Second, Malcolm specifies no structure for the database that could improve performance. Without a structure specifical

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for implementing a highly efficient,... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for implementing a highly efficient,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for implementing a highly efficient,... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3009976

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.