Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2010-03-01
2011-11-15
Maskulinski, Michael (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S047100, C714S048000
Reexamination Certificate
active
08060782
ABSTRACT:
Correlating activity events to identify a root cause of a process failure. Activity event data is received from a process executing on a computing device. The activity event data corresponds to a plurality of activity events. Each of the activity events has a correlation identifier, a resolution status, and an occurrence time value associated therewith. Each of the activity events are assigned to one of a plurality of event groups based on the correlation identifier of the activity event. Thereafter, at least one of the event groups is determined to have an activity event with a resolution status indicating failure of the process. One of the activity events within the determined event group is selected as a root cause activity event based on the occurrence time values. In some embodiments, the root cause activity event is identified to a user of the computing device.
REFERENCES:
patent: 6072777 (2000-06-01), Bencheck et al.
patent: 6694364 (2004-02-01), Du et al.
patent: 7047291 (2006-05-01), Breese et al.
patent: 7529974 (2009-05-01), Thibaux et al.
patent: 2003/0195959 (2003-10-01), Labadie et al.
patent: 2004/0049565 (2004-03-01), Keller et al.
patent: 2008/0059839 (2008-03-01), Hamilton et al.
patent: 2009/0313508 (2009-12-01), Yan et al.
Fu, et al. , “Quantifying Temporal and Spatial Correlation of Failure Events for Proactive Management ”, Retrieved at <<http://www.cs.nmt.edu/˜song/Publications/FC-srds07.pdf>>, 26th IEEE International Symposium on Reliable Distributed Systems, 2007, pp. 175-184.
Glerum, et al. , “Debugging in the (Very) Large: Ten Years of Implementation and Experience”, Retrieved at <<http:// www.sigops.org/sosp/sosp09/papers/glerum-sosp09.pdf>>, SOSP, Oct. 11-14, 2009, Big Sky, Montana, USA, pp. 1-17.
Kiciman, Emre, “Using Statistical Monitoring to Detect Failures in Internet Services ”, Retrieved at <<http://research.microsoft.com/pubs/75094/kiciman-thesis.pdf>>, Sep. 2005, pp. 183.
Crowell, Christopher, “Event Correlation and Root Cause Analysis”, Retrieved at <<http://ca.com/files/WhitePapers/event—correlation—and—root—cause—analysis.pdf>>, Mar. 2004, pp. 15.
Maruyama, et al. , “Model-Based Fault Localization in Large-Scale Computing Systems”, Retrieved at <<http://matsu-www.is.titech.ac.jp/˜naoya/publications/ipdps08.pdf, Apr. 2008, pp. 12.
Caspi Ziv
Dar Ziv Ron E.
Frenkel Itai
Orlin Yifat
Sloutsky Alexander
Maskulinski Michael
Microsoft Corporation
LandOfFree
Root cause problem identification through event correlation does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Root cause problem identification through event correlation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Root cause problem identification through event correlation will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4298393