Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-06-18
2003-10-07
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
Reexamination Certificate
active
06631478
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to computer networks, and more specifically to providing a memory management technique for implementing a high performance stable storage system in a computer network.
2. Background
Service availability and fault tolerance have become important features in building reliable communication systems, networking systems, and reliable distributed transactions. In such an environment, it is extremely important to provide continuous, non-interrupted service to end users. Further, in the event of a service process crash, it is essential that the crashed process be restarted as quickly as possible so that end users do not experience any disruption in service. Conventionally, in order to restart a crashed service process quickly, and reinitialize it correctly, certain critical application data relating to the service process is stored in a stable storage system which is part of the system or network on which the service process is running.
A stable storage system is an abstraction of a perfect storage system which is able to survive certain system failures such as, for example, communication failures, application failures, operating system failures, processor failures, etc. In order to ensure the integrity of key system data and to avoid data inconsistency (caused, for example, by a process crash which occurs in the middle of a system operation), client processes or applications store key data within a stable storage system so that the data can be rolled back to a previously consistent state if a failure occurs. Typically, a stable storage system provides atomic read and write operations to stored data, and keeps the data intact even when failures occur. For example, in a router system, the network state information such as, for example, Forwarding Information Base (FIB), reservation state, and multi-cast group information are stored in stable storage systems in order to restore packet forwarding processes quickly in the event of a process crash.
Traditional stable storage systems use either a file system-based storage system, or a reliable RAM disk. Examples of the traditional file system-based storage systems include the well known Andrew File System RVM and LibFT, from Lucent Technologies of Murray Hill, N.J. An example of a conventional file system-based stable storage system is shown in
FIG. 1
of the drawings.
As shown in
FIG. 1
, stable storage system
104
comprises a block of “non-volatile” memory, such as, for example, a hard drive. Typically, the stable storage system
104
is configured to operate independently of the network operating system in order to preserve the data within the stable storage system in the event of an operation system or network crash. A plurality of clients
102
, which represent various applications or processes running on the network, write and/or read essential data to and from stable storage system
104
. The data is stored within a plurality of data files
110
. An access manager
106
manages the data which is sent to and retrieved from data files
110
. Additionally, the access manager manages a plurality of log or back up files
108
which are used for tracking data which is written to the data files
110
.
While the file-system based approach to stable storage may be resilient to process and OS failures, this approach imposes high performance penalties because of multiple inter-process buffer copying and disk I/O latency. Further, the file system based stable storage system does not support fast, incremental updating of state fragment data such as, for example, FIB entries or TCP message sequence numbers. Rather, conventional disk storage systems support sequential access of data which is managed using sector allocations. To update data within a data file
110
, the new data is typically appended to the end of the data file. In order to retrieve this new data appended to the end of the data file, the entire data file must be accessed sequentially. This slow method of accessing and updating data is undesirable, particularly in systems such as embedded systems (e.g., router systems) where fast recovery is essential to avoiding service disruption.
Alternatively, RAM disks may be used for implementing for stable storage systems. However, the reliable RAM disk approach is undesirable for most commercial systems since it requires installation and support of additional hardware.
While conventional file-system based stable storage systems may be used for storage of essential application data, there exists a continual heed to provide improved techniques for implementing stable storage in conventional computer systems or networks.
SUMMARY OF THE INVENTION
According to specific embodiments of the invention, a technique is provided for implementing a high performance stable storage system which provides stable and fast storage services to applications built on top of one or more operating system (OS) kernels in a computer network.
According to a specific embodiment of the invention, a unique high performance stable storage hierarchy is provided comprising two levels. A set of byte-addressable stable memory regions (SMRs) forms the first level stable storage. Each stable storage memory region (SMR) includes a data structure for storing desired or essential data related to one or more client processes. The SMR is configured to provide an access interface which supports atomic access to its data structure. The SMR is also configured to be resilient to application failures. For example, if a client dies or crashes, the data contained within the SMR will still be accessible to other applications or network components. Further, the SMR is configured to be byte addressable, and configured to share at least a portion of its address space with at least one client process. The term “byte-addressable” refers to the capability of a client to write data into an SMR data buffer directly using pointers (instead of appending logs to a file like in traditional stable storage systems).
The second level of the high performance stable storage hierarchy includes any traditional file system based stable storage system which is configured to communicate with one or more SMRs. The data contained in an SMR can be flushed into (and loaded from) the second level stable storage device atomically upon request.
In accordance with a specific embodiment of the present invention, the plurality of SMRs form a high performance fault resilient “cache” layer to traditional file system based stable storage systems. On most platforms where processors and operating systems are much more reliable than applications, this layer itself can boost the system availability considerably without the typical performance penalties incurred by traditional stable storage systems. This performance gain is especially important for applications which perform fast incremental state updates for small transactions.
An alternate embodiment of the present invention provides a data storage system implemented in a computer system. The computer system includes an operating system and at least one CPU. The data storage system includes at least one SMR managed by the operating system. The SMR includes at least one first data structure for storing data related to a client process. Further, the SMR is configured or designed to support atomic access of data within the first data structure. The SMR is also configured or designed to support incremental updating of client process data within the first data structure. The incremental updating is implemented using a pointer-based data transfer mechanism. The SMR is further configured or designed to allow at least one other client process to access data within the SMR data structure. The SMR may also include a memory based semaphore for providing exclusive access to the SMR when desired (e.g. when writing data to the SMR data structure). The SMR may also include a reference counter for keeping track of the number of client processes accessing the SMR.
An additional aspect of the above-described
Ma Qingming
Wang Zhenyu
Beausoliel Robert
Beyer Weaver & Thomas LLP
Bonzo Bryce P.
Cisco Technology Inc.
LandOfFree
Technique for implementing high performance stable storage... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Technique for implementing high performance stable storage..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Technique for implementing high performance stable storage... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3144799