Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing
Reexamination Certificate
1997-11-24
2002-07-23
Courtenay, III, St. John (Department: 2151)
Electrical computers and digital processing systems: multicomput
Computer-to-computer data routing
Least weight routing
C709S241000
Reexamination Certificate
active
06424988
ABSTRACT:
The present invention relates generally to multicomputer systems, and more particularly, to a multicomputer system employing a microkernel-based serverized distributed operating system.
BACKGROUND OF THE INVENTION
Related Art
Microkernel-based operating system architectures have been employed to distribute operating system services among loosely-coupled processing units in a multicomputer system. For example, in an earlier microkernel-based “serverized” operating system, a set of modular computer software-based system servers sit on top of a minimal computer software microkernel which provides the system servers with fundamental services such as processor scheduling and memory management. The microkernel may also provide an inter-process communication facility that allows the system servers to call each other and to exchange data regardless of where the servers are located in the system. The system servers manage the other physical and logical resources of the system, such as devices, files and high level communication resources, for example. Often, it is desirable for a microkernel to be interoperable with a number of different conventional operating systems. In order to achieve this interoperability, computer software-based system servers may be employed to provide an application programming interface to a conventional operating system.
The block diagram drawing of
FIG. 1
shows an illustrative multicomputer system. The term “multicomputer” as used herein shall refer to a distributed non-shared memory multiprocessor machine comprising multiple sites. A site is a single processor and its supporting environment or a set of tightly coupled processors and their supporting environment. The sites in a multicomputer may be connected to each other via an internal network (e.g., Intel MESH™ interconnect), and the multicomputer may be connected to other machines via an external network (e.g., Ethernet network). Each site is independent in that it has its own private memory, interrupt control, etc. Sites use messages to communicate with each other. A microkernel-based “serverized” operating system is well suited to provide operating system services among the multiple independent non-shared memory sites in a multicomputer system.
An important objective in certain multicomputer systems is to achieve a single-system image (SSI) across all sites of the system. An advantage of an SSI from the point of view of the user, application developer, and for the most part, the system administrator, the multicomputer system appears to be a single computer even though it is really comprised of multiple independent computer sites running in parallel and communicating with each other over a high speed interconnect. Some of the benefits of a SSI include, simplified installation and administration, ease-of-use, open system solutions (i.e., fewer compatibility issues), exploitation of multisite architecture while preserving conventional APIs and ease of scalability. There are several possible beneficial features of an SSI such as, a global naming process, global file access, distributed boot facilities and global STREAMS facilities, for example.
In one earlier system, a SSI is provided which employs a process directory (or name space) which is distributed across multiple sites. Each site maintains a fragment of the process directory. The distribution of the process directory across multiple sites ensures that no single site is unduly burdened by the volume of message traffic accessing the directory. There are challenges in implementing a distributed process directory. For example, such a distributed process directory should be effective in implementing global atomic operations. A global atomic operation (GAO) describes a category of functions which are applied to each process in a set of processes identified in the SSI.
GAOs typically are applied to a set of processes from what is often referred to as, a “consistent snapshot” of the system process directory state. The processes that are operated upon by a GAO are often referred to as target processes. A consistent snapshot generally refers to a view of the directory which identifies the processes in the entire SSI at a discrete point in time. However, since process creation and process deletion events occur frequently, a process directory is a dynamic or “living” object whose contents change frequently. Therefore, the consistent snapshot rule generally is relaxed somewhat such that a consistent snapshot may contain all processes which exist both before and after the snapshot is taken. For the purposes of a GAO, it can be assumed that processes which were destroyed during a consistent snapshot were destroyed prior to it, and processes created during the consistent snapshot were created subsequent to it.
An example of a GAO is what is referred to as sending a signal, which is a mechanism by which a process may be notified of, or affected by, an event occurring in the system. Some application program interfaces (API's) which are provided to the programmer as part of a UNIX specification, for instance, deliver a signal,to a set of processes as a group; such an API, for example, mandates that all processes that match the group criteria receive the signal. The delivery of a signal to a set of processes as a group is an example of a GAO. The processes in the group are examples of target processes.
In a multicomputer system that employs a distributed process directory, GAOs, which must be applied to multiple target processes, may have to traverse process directory fragments on multiple sites in the system. This traversal of directory fragments on different sites in search of processes targeted by an operation can be complicated by the migration of processes between sites while the GAO still is in progress. In other words, a global atomic operation and process migration may progress simultaneously. The proper application of a global atomic operation is to apply it at least once, but only once, to each target process. As processes migrate from site to site during the occurrence of a GAO, however, there arises a need to ensure that a migrating process is neither missed by a GAO nor has the GAO applied to it more than once.
The problem of a GAO potentially missing a migrating process will be further explained through an example involving the global getdents (get directory entries) operation. The getdents operation is used to obtain a “consistent snapshot” of the system process directory. The getdents operation is a global atomic operation. The timing diagram of
FIG. 2
illustrates the example. At time=t, process manager server “A” (PM A) on site A initiates a migration of a process from PM A on site A to the process manager server “B” (PM B) on site B (dashed lines). This process migration involves the removal of the process identification (PID) for the migrating process from the process directory fragment on site A and the insertion of the PID for the migrating process into the process directory fragment on site B. Meanwhile, also at time=t, an object manager server (OM) has broadcast a getdents request to both PM A and PM B. At time=t1, PM B receives and processes the getdents request and returns the response to the OM. This response by PM B does not include a process identification (PID) for the migrating process which has not yet arrived at PM B. At time=t2, PM B receives the migration request from PM A. PM B adds the PID for the migrating process to the directory fragment on site B and returns to PM A a response indicating the completion of the process migration. PM A removes the PID for the migrating process from the site A directory fragment. At time=t3, PM A receives and processes the getdents request and returns the response to the OM. This response by PM A does not include the PID for the migrating process since that process has already migrated to PM B on site B. Thus, the global getdents operation missed the migrating process which was not yet represented by a PID in the site B directory fragment when PM B process
Courtenay III St. John
Rode Lise A.
Starr Mark T.
Unisys Corporation
Woodcock & Washburn LLP
LandOfFree
Multicomputer system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multicomputer system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multicomputer system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2837001