Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-08-15
2004-12-28
Beausoliel, Robert (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C713S002000, C717S170000
Reexamination Certificate
active
06836859
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of data storage systems. More specifically, embodiments of the present invention relate to methods and systems for providing automatic software versioning for controller units of a data storage system.
2. Related Art
FIG. 1A
illustrates a system
10
that includes a host computer or server
12
that interfaces with a disk storage system
14
. The disk storage system
14
is capable of storing large amounts of data, e.g., multiple terabytes, and is designed to operate with a high degree of reliability. One such storage system is the “StarEdge T3 Array” which is commercially available from Sun Microsystems, Inc., of Mountain View, Calif. To maintain the high degree of reliability and large storage capacity, fault tolerant storage system
16
is employed along with multiple redundant controller units
18
a
and
18
b
(which are also called “a partner pair”). The fault tolerant storage system may be a disk array subsystem. The disk array subsystem
16
, contains an array of individual disk units arranged to provide redundancy. The controllers
18
a-b
operate in a master-slave fashion. The controller units
18
a-b
interface with the host system
12
and, in so doing, the controller units
18
a-b
allow the disk array subsystem
16
to be viewed by the host system
10
as one large single volume.
In the past, the software application
20
used by the controllers
18
a-b
was loaded into the disk array subsystem
16
and, upon booting, the controllers
18
a-b
would automatically download this software application into their respective volatile memories
22
and
24
, e.g., random access memory (RAM). The application could then function to make the disk array subsystem
16
appear to the host system
10
as one single volume. Unfortunately, the process of downloading the application from the disk array subsystem
16
on each boot-up is very time consuming and therefore inefficient and error-prone.
FIG. 1B
illustrates another system
26
having a similar complement of components as system
10
, except the controllers
18
a
-
18
b
are different. In this system, the controllers
18
a
-
18
b
contain a respective non-volatile memory
32
and
34
which contains the software application described above. The benefit of this design
26
is that the application no longer needs to be loaded from the disk array subsystem
16
upon each boot. Rather, the application is directly accessed by each controller from its own internal non-volatile memory, e.g.,
32
and
34
. The use of non-volatile memory to serve this purpose increases the overall efficiency of the controllers
18
a-b.
A drawback of system
26
is that the version of the software used to control the controllers
18
a-b
is no longer associated with the disk system
14
, but rather it becomes associated with each individual controller separately. This may lead to several potentially dangerous conditions. For example, a partner pair could have mutually exclusive software versions operating on the two controllers. This could lead to data integrity problems. This situation could occur if one controller was replaced (due to malfunction) and the replacement controller (in the typical case) contains a different software version from the remaining controller. Another example occurs when a controller is loaded into a system, which is configured to operate in an up-level software version, resulting in a conflict of software versions residing within the partner pair. Such version confusion can lead to data corruption or complete storage system failure.
SUMMARY OF THE INVENTION
Described herein are a method and system for performing computer controlled software versioning between multiple controllers in a storage system. The storage system includes a fault tolerant storage system and multiple redundant controllers that allow the disk array to be viewed as a large disk system by a host computer or server. The fault tolerant storage system has stored thereon a preferred version of software to be used by the controllers. This software may be updated by replacing the copy stored in the fault tolerant storage system. The controllers each contain non-volatile memory. On boot, a controller compares the software version in its non-volatile memory to the preferred version in the fault tolerant storage system. If they are different (e.g., the software on the fault tolerant storage system was updated or the controller was updated with a non-preferred software version), then the controller copies the disk array version into its non-volatile memory and then re-boots. One controller is typically left operational while the other is re-booted for redundancy. Computer controlled versioning allows: (1) lockstep software updates between the controllers based on a software version that is associated with (or tied to) the disk system as a whole; and (2) provides a central store from which the controllers may obtain the preferred software version.
A special flash update mechanism is also described with respect to an implementation that uses flash memory as the non-volatile memory. According to this method, each controller has two flash memories for level 2 and level 3 of its boot sequence. On boot, when level 1 of the boot sequence is booting, level 1 software is used to select the most recent valid version of the software stored on the two flash memories of level 2. That selected version is then used to boot level 2. Likewise, on boot, when level 2 is booting, level 2 software selects the most recent valid version of the software stored on the two flash memories of level 3. That selected version is then used to boot level 3. If no valid versions are available, then an error condition exists.
More specifically, embodiments of the present invention are directed toward a method of providing version control within a fault tolerant system having the follow steps: a) invoking a boot sequence of a first controller that is coupled to a storage system having stored thereon a preferred application version; b) during the boot sequence, comparing the preferred application version with a stored application version stored within a memory of the first controller; c) provided the stored application version is different from the preferred application version, storing the preferred application version into the memory and causing the first controller to re-boot to thereby execute the preferred application version after re-boot; and d) provided the stored application version is the same as the preferred application version, causing the first controller to execute the stored application version. Embodiments also include the above and wherein the memory is a programmable non-volatile memory and wherein the memory is a flash memory and wherein the storage system is disk array system.
Embodiments also include the above and wherein step a) includes the following steps: a1) executing a first level wake-up boot sequence; a2) during the first level boot sequence, checking two application versions that are associated with a second level boot sequence and selecting a most recent valid version; and a3) executing the most recent valid version as the second level boot sequence. Embodiments also include a fault tolerant storage system implemented in accordance with the above.
REFERENCES:
patent: 5864698 (1999-01-01), Krau et al.
patent: 5923886 (1999-07-01), Chen et al.
patent: 6381694 (2002-04-01), Yen
patent: 6412082 (2002-06-01), Matsuura
patent: 6446203 (2002-09-01), Aguilar et al.
patent: 6510552 (2003-01-01), Benayounet al.
patent: 6560703 (2003-05-01), Goodman
patent: 6584559 (2003-06-01), Huh et al.
patent: 6594757 (2003-07-01), Martinez
patent: 6622246 (2003-09-01), Biondi
patent: 6675258 (2004-01-01), Bramhall et al.
patent: 6681390 (2004-01-01), Fiske
patent: 2002/0014968 (2002-02-01), Fitzgerald et al.
patent: 2002/0147941 (2002-10-01), Gentile
patent: 2002/0188934 (2002-12-01), Griffioen et al.
patent: 2003/0033515 (2003-02-01), Autry
Berg Jerry
Chen Chin-Te
Chu Jerry
Beausoliel Robert
Kudirka & Jobse LLP
McCarthy Christopher S.
Sun Microsystems Inc.
LandOfFree
Method and system for version control in a fault tolerant... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for version control in a fault tolerant..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for version control in a fault tolerant... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3327134