Scalable distributed file system

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06173293

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to file systems, and more particularly to file systems distributed over multiple computer systems.
BACKGROUND OF THE INVENTION
In modern computer systems, large collections of data are usually organized on disk storage as files. If the number of files is large, then the files may be distributed over multiple computer systems. Users'programs access the files by requesting file services from one or more file systems. The file systems also perform administrative actions such as controlling coherent access by the clients, communicating with physical storage components, maintaining redundant copies, and recovering from failure.
In most file systems, the files comprise user data and metadata. The metadata are all information required to manage the user data, such as names, locations, dates, file sizes, access protection, and so forth. The organization of the user data is usually managed by the client programs.
It is laborious to administer a large distributed file system that serves a large and growing user community. For instance, to store more files, and to serve more users, one must add more disks and more server computers. Each of these components requires human attention. To simplify the distribution of files, groups of files or “volumes” are often manually assigned to particular disks. Then, the files can manually be moved or replicated when components fill up, fail, or become throughput bound.
Joining many thousands of files distributed over many disks into a redundant array of independent disks (RAID) is only a partial solution; administration problems still arise when the system grows so large to require multiple RAIDs, and multiple server processors.
In the prior art, there are have been numerous attempts to construct distributed file systems that are scalable. Scalable in this context means that the file system can be adjusted to any desired size without changing the underlying architecture of the system. Some of these prior art file systems are now described to illustrate the need for a better scalable file system.
The Cambridge File Server (CFS), described by Birrell et al. in “A universal file server,” IEEE Transactions on Software Engineering, SE-6(5):450-453, September 1980, takes a two-layered approach to building a distributed file system. There, the layers provide the users with two abstractions: files and indexes. File systems built on the two layers can use these abstractions to implement a distributed file system. As a characteristic, the CFS manages the entire distributed file system from a single server computer. Controlling data flow from a single server is simple, but in situations where a single server cannot handle the task, the CFS falls short. Also, a single server based system is vulnerable to failure.
The Network File System (NFS), as described by Sandberg et al. in “Design and implementation of the Sun network file system,” Proceedings of the Summer USENIX Conference}, pages 119-130, June 1985, is not a file system in itself, but rather a remote file access protocol. The NFS protocol provides a weak notion of cache coherence, and its stateless design requires client users to make many unnecessary and frequent accesses to the servers to maintain a marginal level of coherence in the data.
The Andrew File System (AFS), described by Howard et al. in “Scale and performance in a distributed file system,” ACM Transactions on Computer Systems, 6(1):51-81, February 1988, and its offshoot DCE/DFS as described by Kazar et el., in “DEcorum file system architectural overview,” Proceedings of the Summer USENIX Conference, pp. 151-164, June, 1990, provides better cache performance and data coherence than NFS. AFS is designed for a different kind of scalability than will be described herein. The AFS has a global name space and security architecture that allows client computers to connect to many separate file servers using a wide area network.
The Echo file system described by Mann et al in “A coherent distributed file cache with directory write-behind,” ACM Transactions on Computer Systems, 12(2):123-164, May 1994, is log-based. The Echo file system replicates data for reliability, and access paths are allowed to span multiple disks for availability. In addition, the Echo file system provides coherent caching.
However, the Echo file system cannot easily be scaled. There, each volume can only be managed by a single server computer. Failover, in the case of a hardware failure, can only be to a predetermined backup server. A volume can only span as many disks as can be connected to a single server. Although there is an internal layering of file services on top of a disk service, the Echo file system requires both layers to execute in the same address space on the same machine.
The VMS Cluster file system, described by Strecker et al. in “VAXclusters: A closely-coupled distributed system,” ACM Transactions on Computer Systems, 4(2):130-146, May 1986, off-loads file system processing to individual servers that are members of a cluster, i.e., a plurality of closely-coupled computers.
Each server in the cluster executes its own instance of the file system program in conjunction with a shared physical disk. Synchronization is provided by a distributed lock service. The shared physical disk is accessed either through a special-purpose cluster interconnect (CI) to which a disk controller can be directly connected, or through an ordinary local area network (LAN) such as Ethernet, and a processor acting as a disk server.
The Spiralog file system described by Johnson et al. in “Overview of the Spiralog file system,” Digital Technical Journal, 8(2):5-14, 1996, also off-loads processing of its file system to individual members of a cluster of interconnected servers that run above a shared storage system layer.
The interface between layers in the Spiralog file system differs from the VMS cluster file system because the lower layer is neither file-like, nor simply disk-like. Instead, Spiralog provides an array of stably-stored bytes, and permits atomic actions to update arbitrarily scattered sets of bytes within the array. Spiralog's split between layers simplifies the file system, but complicates the storage system considerably. Spiralog does not scale easily, nor does Spirolog tolerate hardware faults readily. A Spirolog volume can only span the disks connected to a single server, and the volume becomes unavailable when the server suffers a failure.
Though designed as a cluster file system, Calypso, described by Devarakonda et al. in “Recovery in the Calypso file system,” ACM Transactions on Computer Systems, 14(3):287-310, August 1996, is more similar to Echo than the VMS cluster file system. Like Echo, Calypso stores its files on multi-ported disks, i.e., disks that can be accessed by multiple servers. One of the servers directly connected to each disk acts as a file server for data stored on that disk; when the server fails, another server takes over. Other servers in a Calypso cluster access the current server as file system clients. Like Echo, the client computers can maintain coherent caches using a multiple-reader/single-writer locking protocol.
Shillner et al., in a “Simplifying distributed file systems using a shared logical disk,” Technical Report TR-524-96, Dept. of Computer Science, Princeton University, 1996, describe a distributed file system on top of a shared logical disk. There, a lower layer uses multiple servers cooperating to implement a single logical disk. In an upper layer, multiple independent servers execute the same file system code on top of the logical disk to provide access to shared files. However, the logical disk layer does not provide redundancy. The system can recover from a failure in a local server, but dynamic reconfiguration of other failed servers is not possible.
Their file system uses careful ordering of operations that write file metadata, but the writes are not logged. Their technique avoids the need for a full metadata scan to restore consistency after a server fail

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Scalable distributed file system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Scalable distributed file system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scalable distributed file system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2446862

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.