Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-09-11
2004-04-13
Breene, John (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C714S015000, C714S020000
Reexamination Certificate
active
06721764
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is related to the field of methods and apparatus for maintaining a consistent file system and for creating read-only copies of the file system.
2. Background Art
All file systems must maintain consistency in spite of system failure. A number of different consistency techniques have been used in the prior art for this purpose.
One of the most difficult and time consuming issues in managing any file server is making backups of file data. Traditional solutions have been to copy the data to tape or other off-line media. With some file systems, the file server must be taken off-line during the backup process in order to ensure that the backup is completely consistent. A recent advance in backup is the ability to quickly “clone” (i.e., a prior art method for creating a read-only copy of the file system on disk) a file system, and perform a backup from the clone instead of from the active file system. With this type of file system, it allows the file server to remain on-line during the backup.
File System Consistency
A prior art file system is disclosed by Chutani, et al. in an article entitled
The Episode File System
, USENIX, Winter 1992, at pages 43-59. The article describes the Episode file system which is a file system using meta-data (i.e., inode tables, directories, bitmaps, and indirect blocks). It can be used as a stand-alone or as a distributed file system. Episode supports a plurality of separate file system hierarchies. Episode refers to the plurality of file systems collectively as an “aggregate”. In particular, Episode provides a done of each file system for slowly changing data.
In Episode, each logical file system contains an “anode” table. An anode table is the equivalent of an inode table used in file systems such as the Berkeley Fast File System. It is a 252-byte structure. Anodes are used to store all user data as well as meta-data in the Episode file system. An anode describes the root directory of a file system including auxiliary files and directories. Each such file system in Episode is referred to as a “fileset”. All data within a fileset is locatable by iterating through the anode table and processing each file in turn. Episode creates a read-only copy of a file system, herein referred to as a “done”, and shares data with the active file system using Copy-On-Write (COW) techniques.
Episode uses a logging technique to recover a file system(s) after a system crashes. Logging ensures that the file system meta-data are consistent. A bitmap table contains information about whether each block in the file system is allocated or not. Also, the bitmap table indicates whether or not each block is logged. All meta-data updates are recorded in a log “container” that stores transaction log of the aggregate. The log is processed as a circular buffer of disk blocks. The transaction logging of Episode uses logging techniques originally developed for databases to ensure file system consistency. This technique uses carefully order writes and a recovery program that are supplemented by database techniques in the recovery program.
Other prior art systems including JFS of IBM and VxFS of Veritas Corporation use various forms of transaction logging to speed the recover process, but still require a recovery process.
Another prior art method is called the “ordered write” technique. It writes all disk blocks in a carefully determined order so that damage is minimized when a system failure occurs while performing a series of related writes. The prior art attempts to ensure that inconsistencies that occur are harmless. For instance, a few unused blocks or inodes being marked as allocated. The primary disadvantage of this technique is that the restrictions it places on disk order make it hard to achieve high performance.
Yet another prior art system is an elaboration of the second prior art method referred to as an “ordered write with recovery” technique. In this method, inconsistencies can be potentially harmful. However, the order of writes is restricted so that inconsistencies can be found and fixed by a recovery program. Examples of this method include the original UNIX file system and Berkeley Fast File System (FFS). This technique does not reduce disk ordering sufficiently to eliminate the performance penalty of disk ordering. Another disadvantage is that the recovery process is time consuming. It typically is proportional to the size of the file system. Therefore, for example, recovering a 5 GB FFS file system requires an hour or more to perform.
File System Clones
FIG. 1
is a prior art diagram for the Episode file system illustrating the use of copy-on-write (COW) techniques for creating a fileset clone. Anode
110
comprises a first pointer
110
A having a COW bit that is set. Pointer
110
A references data block
114
directly. Anode
110
comprises a second pointer
110
B having a COW bit that is cleared. Pointer
110
B of anode references indirect block
112
. Indirect block
112
comprises a pointer
112
A that references data block
124
directly. The COW bit of pointer
112
A is set. Indirect block
112
comprises a second pointer
112
B that references data block
126
. The COW bit of pointer
112
B is cleared.
A clone anode
120
comprises a first pointer
120
A that references data block
114
. The COW bit of pointer
120
A is cleared. The second pointer
120
B of clone anode
120
references indirect block
122
. The COW bit of pointer
120
B is cleared. In turn, indirect block
122
comprises a pointer
122
A that references data block
124
. The COW bit of pointer
122
A is cleared.
As illustrated in
FIG. 1
, every direct pointer
110
A,
112
A-
112
B,
120
A, and
122
A and indirect pointer
110
B and
120
B in the Episode file system contains a COW bit. Blocks that have not been modified since the clone was created are contained in both the active file system and the clone, and have set (1) COW bits. The COW bit is cleared (0) when a block that is referenced to by the pointer has been modified and, therefore, is part of the active file system but not the clone.
When a clone is created in Episode, the entire anode table is copied, along with all indirect blocks that the anodes reference. The new copy describes the clone, and the orignal copy continues to describe the active file system. In the original copy, the COW bits in all pointers are set to indicate that they point to the same data blocks as the clone. Thus, when inode
110
in
FIG. 1
was cloned, it was copied to clone anode
120
, and indirect block
112
was copied to clone indirect block
122
. In addition, COW bit
12
A was set to indicate that indirect blocks
112
and
122
both point to data block
124
. In
FIG. 1
, data block
124
has not been modified since the clone was created, so it is still referenced by pointers
112
A and
112
B, and the COW bit in
112
A is still set. Data block
126
is not part of the clone, and so pointer
112
B which references it does not have its COW bit set.
When an Episode clone is created, every anode and every indirect block in the file system must be copied, which consumes many mega-bytes and takes a significant mount of time to write to disk.
A fileset “clone” is a read-only copy of an active fileset wherein the active fileset is readable and writable. Clones are implemented using COW techniques, and share data blocks with an active fileset on a block-by-block basis. Episode implements cloning by copying each anode stored in a fileset. When initially cloned, both the writable anode of the active fileset and the cloned anode both point to the same data block(s). However, the disk addresses for direct and indirect blocks in the original anode are tagged as COW. Thus, an update to the writable fileset does not affect the clone. When a COW block is modified, a new block is allocated in the file system and updated with the modification. The COW flag in the pointer to this new block is cleared.
The prior art Episode system creates clones that duplicate the entire inode file and all of t
Hitz David
Lau James
Malcolm Michael
Rakitzis Byron
Network Appliance Inc.
Swernofsky Law Group PC
LandOfFree
Copy on write file system consistency and block usage does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Copy on write file system consistency and block usage, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Copy on write file system consistency and block usage will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3220749