Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-06-20
2004-06-15
Beausoliel, Robert (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S042000, C714S057000
Reexamination Certificate
active
06751758
ABSTRACT:
A portion of the disclosure of this patent document contains command formats and other computer language listings, all of which are subject to copyright protection. The copyright owner, EMC Corporation, has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
FIELD OF THE INVENTION
The invention relates generally to error detection and correction of errors in a data storage environment, and more particularly to a system and method for augmenting and simplifying the task of service professionals who handle such errors for data storage systems.
BACKGROUND OF THE INVENTION
As is known in the art, computer systems generally include a central processing unit (CPU), a memory subsystem, and a data storage subsystem. According to a network or enterprise model of the computer system, the data storage system associated with or in addition to a local computer system, may include a large number of independent storage devices or disks housed in a single enclosure or cabinet. This array of storage devices is typically connected to several computers over a network or via dedicated cabling. Such a model allows for the centralization of data that is to be shared among many users and also allows for a single point of maintenance for the storage functions associated with the many host processors.
The data storage system stores critical information for an enterprise that must be available for use substantially all of the time. If an error occurs on such a data storage system it must be fixed as soon as possible because such information is at the heart of the commercial operations of many major businesses. A recent economic survey from the University of Minnesota and known as Bush-Kugel study indicates a pattern that after just a few days (2 to 6) without access to their critical data many businesses are devastated. The survey showed that 25% of such businesses were immediately bankrupt after such a critical interruption and less than 7% remained in the marketplace after 5 years.
Recent innovations by EMC Corporation of Hopkinton, Mass. provide business continuity solutions that are at the heart of many enterprises data storage infrastructure. Nevertheless, the systems (including devices and software) being implemented are complex and vulnerable to errors that must be quickly serviced for the continuity to be maintained.
EMC has been using a technique for responding to errors as they occur by “calling home” to report the errors. The data storage system is equipped with a modem and a service processor (typically a laptop computer) for error response. Sensors that are built into its storage systems monitor things such as temperature, vibration, and tiny fluctuations in power, as well as unusual patterns in the way data is being stored and retrieved—over 1,000 diagnostics in all. Periodically (about every two hours), an EMC data storage system checks its own state of health. If an error is noted, a machine-implemented “call home” is made to customer service over a line dedicated for that purpose.
Every day, an average of 3,500 such calls home for help reach EMC's customer service center in Hopkinton. About one-third of the calls from EMC's machines trigger the dispatch of a customer engineer to fix some problem. Remarkably under such a service program, many problems are resolved before the owner of the data storage system is even aware that there has been a problem. However, some error codes that result in calls home are the result of known problems for which a design engineering fix is pending, or minor problems that do not require immediate attention. Such calls can be deferred or ignored so that more urgent and important errors may be dealt with. This is known as screening or filtering errors
However when filtering errors it is important to not introduce costly mistakes. If the error for which screening is intended is not properly filtered then expensive, wasteful, unnecessary, and burdensome calls back to home continue. If, on the other hand, an important error is wrongly ignored, then that could cause harm. These two situations could occur at the same time flooding the customer service center with unimportant calls while important errors are ignored.
What is needed is a way to screen for known errors occurring in a data storage system in a simple and clear manner, while reducing the risk that mistakes are created. Furthermore, it would be an advancement in the art if such a screening tool could be administered on a remote basis so it could handle data storage systems located anywhere in the world.
SUMMARY OF THE INVENTION
The present invention is a data storage management system and method that includes a simple clearly presented tool for screening out or filtering errors occurring in a data storage system.
In one embodiment, the invention includes a method that is useful in a data storage system with more than one storage device. The method provides steps for the management of errors related to a data storage system. The method includes receiving at an error response station a message about an error related to the data storage system, and providing a graphical user interface (GUI) for enabling the selective entry of error handling information in response to receiving the message. The method may be further useful for suppression of such handling information, diagnosing such messages, and taking or recommending corrective action.
In another embodiment the invention includes a system capable of performing the method and computer-executed logic capable of carrying out the method.
In another embodiment, the above-specified techniques are enabled to be remotely deployed to manage response to errors occurring on data storage systems anywhere in the world.
BRIEF DESCRIPTION OF THE DRAWINGS
The above and further advantages of the present invention may be better under stood by referring to the following description taken into conjunction with the accompanying drawings in which:
FIG. 1
is a block diagram of a data storage management system including filter logic that operates with the filter log for operating the present invention and including a data storage system, a service processor, and a remote error response station;
FIG. 2
is a block diagram of the architecture for the logic shown in
FIG. 1
as implemented on the service processor of
FIG. 1
;
FIG. 3
is a block diagram of the architecture for the logic shown in
FIG. 1
as implemented on the error response station of
FIG. 1
;
FIG. 4
is a schematic representation of contents comprising the filter log of
FIG. 1
;
FIG. 5
is in a schematic representation of contents comprising an embodiment of the filter log of
FIGS. 1 and 4
;
FIG. 6
is a flow logic diagram of the method of this invention using the filter logic and filter log on the system of
FIG. 1
;
FIG. 7
is another flow logic diagram and is a continuation of the illustration of the method begun in
FIG. 6
;
FIG. 8
is another flow logic diagram and is a continuation of the illustration of the method begun in
FIG. 6
;
FIG. 9
is another flow logic diagram and is a continuation of the illustration of the method begun in
FIG. 6
;
FIG. 10
is another flow logic diagram and is a continuation of the illustration of the method begun in
FIG. 6
;
FIG. 11
is another flow logic diagram and is a continuation of the illustration of the method begun in
FIG. 6
;
FIG. 12
is an example of a graphical user interface (GUI) tool useful for creating an error handling information denoted as a filter entry for the filter log of FIG.
1
and useful for the method shown in
FIGS. 6-11
;
FIG. 13
is another example of a graphical user interface (GUI) tool useful for placing error handling information in the filter log of FIG.
1
and useful for the method shown in
FIGS. 6-11
;
FIG. 14
is another example of a graphical user interface (GUI) tool useful for placing error handling information in the filter log of FIG.
1
Alipui Gilbert
Britz-Artzi Hagit
Sharp Timothy
Beausoliel Robert
EMC Corporation
Fitzgerald Leanne J.
Gunther John M.
Perkins Robert Kevin
LandOfFree
Method and system for handling errors in a data storage... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for handling errors in a data storage..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for handling errors in a data storage... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3338254