Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2011-08-23
2011-08-23
Guyton, Philip (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S004110, C714S004120, C714S004210, C714S013000, C714S047100
Reexamination Certificate
active
08006124
ABSTRACT:
Provided are a large-scale cluster monitoring system and a method for automatically building/restoring the same, which can automatically build a large-scale monitoring system and can automatically build a monitoring environment when a failure occurs in nodes. The large-scale cluster monitoring system includes a CM server, a BD server, GM nodes, NA nodes, and a DB agent. The CM server manages nodes in a large-scale cluster system. The DB server stores monitoring information that is state information of nodes in groups. The GM nodes respectively collect the monitoring information that is the state information of the nodes in the corresponding groups to store the collected monitoring information in the DB server. The NA nodes access the CM server to obtain GM node information and respectively collect the state information of the nodes in the corresponding groups to transfer the collected state information to the corresponding GM nodes. The DB agent monitors the monitoring information of the nodes in the groups, which is stored in the DB server, to detect a possible node failure.
REFERENCES:
patent: 6088727 (2000-07-01), Hosokawa et al.
patent: 6594786 (2003-07-01), Connelly et al.
patent: 6718486 (2004-04-01), Roselli et al.
patent: 6983317 (2006-01-01), Bishop et al.
patent: 7287180 (2007-10-01), Chen et al.
patent: 7447940 (2008-11-01), Peddada
patent: 7480816 (2009-01-01), Mortazavi et al.
patent: 2007/0206611 (2007-09-01), Shokri et al.
patent: 2008/0201470 (2008-08-01), Sayama
patent: 2003-0051930 (2003-06-01), None
patent: 1020050066133 (2005-06-01), None
Xue et al. “AOCMS: An Adaptive and Scalable Monitoring System For LArge-Scale Clusters.” Proc. of the 2006 IEEE Asia-Pacific Conf on Services Computing. Dec. 2006.
Park et al. “The Cluster Monitoring and Controlling Method with Scalable Communication Framework.” Proc of the Eighth Intl Conf on High-Performance Computing in Asia-Pacific Region. 2005.
Matthew L. Massie et al., “The ganglia distributed monitoring system: design, implementation, and experience”, Parallel Computing 30 (2004), pp. 817-840.
Jeong Jin-Hwan
Kim Chang-Soo
Kim Hag-Young
Lee Yong-Ju
Park Choon-Seo
Electronics and Telecommunications Research Institute
Guyton Philip
Staas & Halsey , LLP
LandOfFree
Large-scale cluster monitoring system, and method of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Large-scale cluster monitoring system, and method of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Large-scale cluster monitoring system, and method of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2665748