Electrical computers and digital processing systems: multicomput – Computer-to-computer protocol implementing – Computer-to-computer data streaming
Reexamination Certificate
1998-10-27
2002-05-21
Harrell, Robert B. (Department: 2152)
Electrical computers and digital processing systems: multicomput
Computer-to-computer protocol implementing
Computer-to-computer data streaming
Reexamination Certificate
active
06393485
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates generally to a distributed data processing system and in particular to a method and apparatus for managing a server system within a distributed data processing system. Still more particularly, the present invention relates to a method and apparatus for managing a clustered computer system.
2. Description of Related Art
A clustered computer system is a type of parallel or distributed system that consists of a collection of interconnected whole computers and is used as a single, unified computing resource. The term “whole computer” in the above definition is meant to indicate the normal combination of elements making up a stand-alone, usable computer: one or more processors, an acceptable amount of memory, input/output facilities, and an operating system. Another distinction between clusters and traditional distributed systems concerns the relationship between the parts. Modern distributed systems use an underlying communication layer that is peer-to-peer, There is no intrinsic hierarchy or other structure, just a flat list of communicating entities. At a higher level of abstraction, however, they are popularly organized into a client-server paradigm. This results in a valuable reduction in system complexity. Clusters typically have a peer-to-peer relationship.
There are three technical trends to explain the popularity of clustering. First, microprocessors are increasingly fast. The faster microprocessors become, the less important massively parallel systems become. It is no longer necessary to use super-computers or aggregations of thousands of microprocessors to achieve suitably fast results. A second trend that has increased the popularity of clustered computer systems is the increase in high-speed communications between computers. The introduction of such standardized communication facilities as Fibre Channel Standard (FCS), Asynchronous Transmission Mode (ATM), the Scalable Coherent Interconnect (SCI), and the switched Gigabit Ethernet are raising inter-computer bandwidth from 10 Mbits/second through hundreds of Mbytes/second and even Gigabytes per second. Finally, standard tools have been developed for distributed computing. The requirements of distributed computing have produced a collection of software tools that can be adapted to managing clusters of machines. Some, such as the Internet communication protocol suite (called TCP/IP and UDP/IP) are so common as to be ubiquitous de facto standards. High level facilities built on the base, such as Intranets, the Internet and the World Wide Web, are similarly becoming ubiquitous. In addition, other tool sets for multisense administration have become common. Together, these are an effective base to tap into for creating cluster software.
In addition to these three technological trends, there is a growing market for computer clusters. In essence, the market is asking for highly reliable computing. Another way of stating this is that the computer networks must have “high availability.” For example, if the computer is used to host a web-site, its usage is not necessarily limited to normal business hours. In other words, the computer may be accessed around the clock, for every day of the year. There is no safe time to shut down to do repairs. Instead, a clustered computer system is useful because if one computer in the cluster shuts down, the others in the cluster automatically assume its responsibilities until it can be repaired. There is no down-time exhibited or detected by users.
Businesses need “high availability” for other reasons as well. For example, business-to-business intranet use involves connecting businesses to subcontractors or vendors. If the intranet's file servers go down, work by multiple companies is strongly affected. If a business has a mobile workforce, that workforce must be able to connect with the office to download information and messages. If the office's server goes down, the effectiveness of that work force is diminished.
A computer system is highly available when no replaceable piece is a single point of failure, and overall, it is sufficiently reliable that one can repair a broken part before something else breaks. The basic technique used in cluster to achieve high availability is failover. The concept is simple enough: one computer (A) watches over another computer (B); if B dies, A takes over B's work. Thus, failover involves moving “resources” from one node to another. A node is another term for a computer. Many different kinds of things are potentially involved: physical disk ownership, logical disk volumes, IP addresses, application processes, subsystems, print queues, collection of cluster-wide locks in a shared-data system, and so on.
Resources depend on one another. The relationship matters because, for example, it will not help to move an application to one node when the data it uses is moved to another. Actually it will not even help to move them both to the same node if the application is started before the necessary disk volumes are mounted. In modern cluster systems such as IBM HACMP and Microsoft “Wolfpack”, the resource relationship information is maintained in a cluster-wide data file. Resources that depend upon one another are organized as a resource group and are stored as a hierarchy in that data file. A resource group is the basic unit of a failover.
With reference now to the figures, and in particular with reference to
FIG. 1
, a pictorial representation of a distributed data processing system in which the present invention may be implemented is depicted.
Distributed data processing system
100
is a network of computers in which the present invention may be implemented. Distributed data processing system
100
contains one or more public networks
101
, which is the medium used to provide communications links between various devices, client computers, and server computers connected within distributed data processing system
100
. Network
100
may include permanent connections, such as Token Ring, Ethernet, 100 Mb Ethernet, Gigabit Ethernet, FDDI ring, ATM, and high speed switch, or temporary connections made through telephone connections. Client computers
130
and
131
communicates to server computers
110
,
111
,
112
, and
113
via public network
101
.
Distributed data processing system
100
optionally has its own private communications networks
102
. Communications on network
102
can be done through a number of means: standard networks just as in
101
, shared memory, shared disks, or anything else. In the depicted example, a number of servers
110
,
111
,
112
, and
113
are connected both through the public network
101
as well as private networks
102
. Those servers make use the private network
102
to reduce the communication overhead resulting from heartbeating each other and running membership and n-phase commit protocols.
In the depicted example, all servers are connected to a shared disk storage device
124
, preferably a RAID device for better reliability, which is used to store user application data. Data are made highly available in that when a server fails, the shared disk partition and logical disk volume can be failed over to another node so that data will continue to be available. The shared disk interconnection can be SCSI bus, Fibre Channel, and IBM SSA. Alternatively, each server machine can also have local data storage device
120
,
121
,
122
, and
123
.
FIG. 1
is intended as an example, and not as an architectural limitation for the processes of the present invention.
Referring to
FIG. 2
a
, Microsoft's first commercially available product, the Microsoft Cluster Server (MSCS)
200
, code name “Wolfpack”, is designed to provide high availability for NT Server-based applications. The initial MSCS supports failover capability in a two-node
202
,
204
, shared disk
208
cluster.
Each MSCS cluster consists of one or two nodes. Each node runs its own copy of Microsoft Cluster Server. Each node also has one or more Resource Monitors that int
Chao Ching-Yun
Goal Patrick M.
McCarty Richard James
Harrell Robert B.
International Business Machines - Corporation
LaBaw Jeffrey S.
Tkacs Stephen R.
Yee Duke W.
LandOfFree
Method and apparatus for managing clustered computer systems does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for managing clustered computer systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for managing clustered computer systems will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2856372