Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-03-30
2004-10-26
Baderman, Scott (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S038110
Reexamination Certificate
active
06810495
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to a method and system for software rejuvenation, and more particularly to a method and system for transparent symptom-based selective software rejuvenation.
2. Description of the Related Art
Software failures are now known to be a dominant source of system outages. One common form of software failure is due to “software aging” in which a resource, such as memory usage, is increasingly consumed and which eventually causes the system to fail. Preventing such aging by restarting the system (or subsystem) is known as “software rejuvenation.”
The background and related art pertaining to software rejuvenation is described in detail in the above-mentioned copending U.S. patent application Ser. Nos. 09/442,003 and No. 09/442,001.
The second of these applications (e.g., copending application Ser. No. 09/442,001) deals with prediction of resource exhaustion due to software aging effects, and teaches that resource exhaustion can be predicted using trend analysis on recently measured symptoms of the system being monitored. Specific trend analysis techniques used include linear regression, polynomial regression, and a modification of “Sen's slope estimate.” These methods attempt to predict when a resource, or set of resources, approach a state in which resource exhaustion is imminent, and a software rejuvenation should be scheduled. However, copending application Ser. No. 09/442,001 does not teach how to select which trending method to use.
Furthermore, the suggested trending methods may not always be effective. For example, while polynomial regression may adequately fit the data observed in the recent past, it is not always a good predictor since polynomial values extrapolated into the future are not necessarily monotone. Further, such estimates are often unstable.
Thus, prior to the present invention, there has been no method of scheduling rejuvenation times by predicting resource exhaustion times from the best predictor, selected from a multitude of possible types of predictors (models). Further, while it is noted that similar notions are used in classical statistics to select the best model from amongst a set of possible models (see, e.g., Chapter 6 of
Applied Regression Analysis, Second Edition
, Norman Draper and Harry Smith, John Wiley & Sons, Inc., 1981), prior to the present invention, such approaches have not been used to predict software resource exhaustion times and to avoid disruptive software failures by scheduling rejuvenation times.
Moreover, the preferred types of predictors to consider, how to set their parameters, and how to choose between different predictors very much depends upon the software rejuvenation context. Indeed, the details of selecting appropriate classes of models, and appropriate penalty functions is not straightforward. Hence, no such easy consideration (let alone recognition of the problem) has been undertaken prior to the present invention.
SUMMARY OF THE INVENTION
In view of the foregoing and other problems, drawbacks, and disadvantages of the conventional methods and structures, an object of the present invention is to provide a method and structure having a prediction module for a software rejuvenation agent operating in a computing environment.
In a first aspect of the present invention, a method (and computer system where at least one software component thereof is restarted based on projection of resource exhaustion), for selecting the most suitable projection method from among a class of projection methods, includes providing M fitting modules which take measured symptom data associated with the system as input and produce M scores, wherein M is an integer, selecting the fitting module producing the best score, and from the selected module, producing a prediction of the resource exhaustion time.
Thus, the inventive prediction module increases system availability by avoiding disruptive system crashes by scheduling software rejuvenations at times prior to estimated resource exhaustion, and avoiding unnecessary rejuvenations caused by poor prediction of resource exhaustion times.
In the invention, multiple fitting modules are run against a recently collected symptom time series data sets from the system being monitored (These are called “symptom parameters” in copending application Ser. No. 09/442,001). Examples of measured symptoms that can be monitored in the exemplary application include memory usage, number of processes, etc. Such symptoms depend on, for example, the operating system, the applications being run, etc. Obviously, as would be known by one of ordinary skill in the art taking the present application as a whole, other symptoms may be measured, depending on the operating system.
Multiple measured symptoms can also be combined into a single, aggregate measured symptom. Associated with each fitting module is a score (or penalty) that measures how effectively the fitting module fits (describes) the collected data. The fitting module having the best score is selected as being the most reliable module for describing the behavior of the measured symptom.
Associated with each fitting module is a prediction module that predicts when the system will exhaust resources associated with the measured symptoms. The prediction module corresponding to the fitting module with the best score is selected as the most reliable predictor of resource exhaustion for that symptom.
These predictions (e.g., one for each measured symptom) are input to the software rejuvenation agent which then schedules rejuvenations based on the predictions, as well as other considerations (e.g., there may be rules stating that two simultaneous rejuvenations are not permitted, or rules preferring that rejuvenations be scheduled during “non-prime shifts”, etc.).
Thus, the present invention improves upon previous approaches to scheduling rejuvenation times by predicting resource exhaustion times from the best predictor, selected from a multitude of possible types of predictors (models). Further, the invention optimizes and selects the preferred types of predictors to consider, how to set their parameters, and how to choose between different predictors very much depends upon the software rejuvenation context.
REFERENCES:
patent: 5715386 (1998-02-01), Fulton, III et al.
patent: 5748882 (1998-05-01), Huang
patent: 6112136 (2000-08-01), Paul et al.
patent: 6172673 (2001-01-01), Lehtinen et al.
patent: 6182249 (2001-01-01), Wookey et al.
patent: 6363332 (2002-03-01), Rangarajan et al.
patent: 6374368 (2002-04-01), Mitchell et al.
patent: 6415189 (2002-07-01), Hajji
patent: 6594784 (2003-07-01), Harper et al.
patent: 6598184 (2003-07-01), Merget et al.
patent: 6629266 (2003-09-01), Harper et al.
patent: 2001/0042227 (2001-11-01), Stephenson et al.
patent: 2002/0087612 (2002-07-01), Harper et al.
patent: 2002/0087913 (2002-07-01), Harper et al.
patent: 2002/0144178 (2002-10-01), Castelli et al.
patent: 2003/0023719 (2003-01-01), Castelli et al.
patent: 2003/0037290 (2003-02-01), Price et al.
patent: 2003/0079154 (2003-04-01), Park et al.
patent: 0701209 (1996-03-01), None
Bao, Sun, Trivedi “Adaptive rejuvenation: Degradation Model Scheme” IEEE: Proceedings of the 2003 International Conferenc on Dependable Computing.*
Bobbio, Sereno “Fine Grained Software Rejuvenation Models” Date unknown.*
Li, Vaidyanathan, Trivedi “An Approach for Estimation of Software Aging in a Web Server” IEEE: Proceedings of the 2002 International Symposium on Emperical Software Engineering.*
Vaidyanathan, Kalyanaraman, et al., “A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems”, Nov. 1-4, 1999; International Symposium on Software Reliability Engineering 1999 Proceedings; pp. 84-93.
Garg, Sachin, et al., “Analysis of Software Rejuvenation Using Markov Regenerative Stochastic Petri Net”, Oct. 24-27, 1995; International Symposium on Software Reliability Engineering, IEEE, 1995, Proceedings; pp. 180-187.
Wang, Yi-Min, et al., “Checkpointing and Its Applications”, Jun. 27-
Castelli Vittorio
Harper Richard E.
Heidelberger Philip
Baderman Scott
Bonzo Bryce P.
McGinn & Gibb PLLC
Zarick, Esq. Gail H.
LandOfFree
Method and system for software rejuvenation via flexible... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for software rejuvenation via flexible..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for software rejuvenation via flexible... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3273269