Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area
Reexamination Certificate
2002-08-28
2004-12-14
Moazzami, Nasser (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Shared memory area
C711S169000, C714S011000
Reexamination Certificate
active
06832298
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to an operation control method for computer systems of the server class, and relates in particular to a server system operation control method to implement high-speed processing typified by failover processing during system problems and cloning processing during high loads to enhance operability and reliability within the same system.
BACKGROUND OF THE INVENTION
Businesses operating on the Internet may lose business opportunities directly due to being unable to access the system when down or poor response time caused by sudden increase in a server system access. Methods typified by failover and cloning that improve operability have already been proposed as techniques to shorten these time losses as much as possible.
The referred term “failover” is a method to switch from the present main system to a standby system and have the standby system take over the processing, when a problem has occurred in the present system processing. The referred term “cloning” is a method used when the processing of the main system is subjected to heavy loads such that when processing has backed up (delayed) in the main system, a standby system shares a portion of the processing load.
Specific examples of these methods are described in “Sun (™) Enterprise (™) Cluster Failover” white paper issued by Sun Microsystems Inc.
The structure of the server system based on the technology of the related art is shown in FIG.
2
.
In this figure, the reference numeral
202
denotes the main server system in charge of the normal processing in this system. Reference numeral
203
denotes the standby server system to take over the processing when an error has occurred in the main server
202
.
Reference numeral
204
is a shared disk which is shared by the main server
202
and the standby server system
203
. Reference numeral
205
is a network, such as a LAN or the Internet.
Numeral
201
is a client terminal for accessing the server system by way of the same network
205
and requesting processing.
As shown in this drawing, functions such as failover and cloning are implemented in the related art assuming the sharing of information between the main server
202
and the standby server system
203
by the shared disk
204
in a cluster type system.
The take-over processing in the server system shown in
FIG. 2
is now described while referring to FIG.
3
.
In
FIG. 3
, the time-wise process flow from top to bottom in the mutual interaction among the client terminal
201
, the main server system
202
, the standby server system
203
and the shared disk
204
that make up the main elements in this processing is shown.
First of all, just as shown in the processing request and normal response
301
, during the normal operation, the main server system
202
performs processing according to the processing request from the client terminal
201
and the results are sent back to the client terminal
201
as the response.
This processing is repeated as processing requests are generated from the client terminal
201
.
The present server save processing
302
is also conducted during normal operation.
If the main server system
202
is unable to respond to any inquiries due to problems with the hardware or the OS (operating system) or software such that the status information in its main memory cannot be searched, the main server system
202
writes its own required status information on the shared disk
204
at a specified timing.
This process could be constantly performed every time an event caused by a change in status occurs. However, the overhead required for accessing the disk is generally high and there are problems with the main server system
202
processing capability such that this solution is not practical.
The main system operation check request (hereafter “main system operation check”) as well as the correct response
303
operation, are operation monitoring processes of the main server system
202
run by the standby server system
203
. These processes also run during the normal operation.
The communication to check operating status of the main server system
202
is performed at each timing specified by the standby server system
203
. The main server system
202
responds to this communication with a reply that there are no errors and a check is made to ensure that the main server system
202
is operating correctly.
In the figure,
304
indicates a point where a problem has occurred in this main server system
202
.
Operation
305
is an operation status check of the main server system
202
made by the standby server system
203
after a problem first occurs, which indicates the standby server system
203
has detected the occurred error.
The error response shows a case that there is absolutely no response or the response is delayed due to an error.
In operation
306
on the other hand, after a problem occurs, the standby server system
203
performs a take-over processing, and the operation from the processing request issued from the client terminal
201
until the main server system
202
processing is taken over by the standby server system
203
is shown.
Here, an error response indicates that a response is not returned within a specified time.
The standby server system
203
, having detected a problem in the main server system
202
in the operation
305
, commences the take-over processing as shown in operation
307
. In that process, in order to restore the processing of the main server system
202
, the status information stored in operation
302
by the main server system
202
on the shared disk
204
, is loaded from the shared disk
204
in operation
308
.
The standby server system
203
restores the processing status of the main server system
202
by using this status information. After preparing to take over the processing from the main server system
202
, it completes the take-over processing in operation
309
.
The standby server system
203
then starts processing as the main server system as shown by the configuration of operation
310
. As a result of operation
306
, it responds to reprocessing requests from the client terminal
201
and other processing requests.
These methods of the related art have the following problems and are unable to meet user needs for high operability.
(1) Restoring the main server system
202
processing by using the standby server system
203
required time for accessing the shared disk
204
and for performing the processing.
(2) The most recent information present on the main memory of main server system
202
when the system problem occurred, did not appear in the shared disk
204
or is impossible to load such that there are limits on how far back status could be restored.
The present invention therefore has the object of resolving the above described problems and to provide a system of high operability by shortening access failures and response times by failover and cloning, etc.
SUMMARY OF THE INVENTION
A server system operation control method of the present invention using a single shared memory type multiprocessor system made up of plural processors, a main memory device, an external memory device and a single shared main memory multiprocessor and a connection means for mutually connecting these components is characterized in that,
at two logical units are defined with each unit made up of any number processors and a portion of a main memory device, one logical unit is defined as a main logical unit and the other is defined as a standby logical unit; a memory segment is provided on the main memory device to be accessible from both the main logical unit and the standby logical unit and, an information storage space is provided on the memory segment to store information for take-over of control from the main logical unit to the standby logical unit; and
the main logical unit stores information required for take-over of control to the information storage space as the information is made, and
the standby logical unit searches information stored in the information storage space when a take-over re
Fujii Hiroaki
Kawashimo Tatsuya
Miki Yoshio
Takamura Akihiro
A. Marquez, Esq. Juan Carlos
Fisher Esq. Stanley P.
Hitachi , Ltd.
Moazzami Nasser
Reed Smith L.L.P.
LandOfFree
Server system operation control method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Server system operation control method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Server system operation control method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3315289