Multiplex communications – Fault recovery – Bypass an inoperative station
Reexamination Certificate
1998-09-04
2001-02-20
Cangialosi, Salvatore (Department: 2732)
Multiplex communications
Fault recovery
Bypass an inoperative station
C327S292000, C714S001000
Reexamination Certificate
active
06192027
ABSTRACT:
FIELD OF INVENTION
This invention relates generally to Fibre Channel Loop topology in a computer system, and more particularly to structure and method for by-passing a failed controller connected in a dual-active mode Fibre Channel Loop.
BACKGROUND OF THE INVENTION
In disc array systems, for example in JBOD (Just a Bunch Of Disks), RAID (Redundant Array of Independent Disks), or other systems having a plurality of devices, one or more controllers (for example disc controllers) are provided as interfaces between the host system and one or more devices (such as RAID disc devices). A Dual-Active system configuration provides maximum data availability and integrity between the Host system and the disk storage. During normal operation, the availability of two RAID controller communicating to the host provides greater data transfer bandwidth. In the event of a controller failure, the failover processor provides full data availability and integrity. A Fibre Channel is a high-speed I/O interface protocol that can be transferred over two categories of physical layers, copper or fibre optic cable.
When two Fibre Channel disk array controllers are used in a Dual Active system configuration it is important that the Dual Active system be able to continue normal operation even when either one of the two controllers has failed for any reason. Failure one controller may result, for example, from a defective electronic component in the controller, or loss of power to the controller, such as may occur if the controller power supply fails. Typically, a controller has an interface to the host system (either the I/O system host in the event that there are a plurality of I/O systems, or to an overall system host), and an interface to devices. In the discussion that follows, we will consider a host server system and a plurality of disk drives. For such a configuration, two areas are particularly problematic relative to ensuring the system's Fibre Channel Loop (FCL) resiliency during controller failure: (1) the controller's Fibre Channel (FC) Loop connection to the host servers; and (2) and the controller's Fibre Channel (FC) Loop connection to the disk drives.
One possible approach to maintaining FCL resiliency is now described relative to the typical Fibre storage system
30
in FIG.
1
. In this multi-hub system
30
, the FC Loop resiliency problem may be somewhat solved by managing each of the controller's FC Loops by a separate external FC Hub
34
,
35
for each host and/or a separate external hub
50
,
51
,
52
,
53
for each disc channel. The external hub
34
connects host server
31
to first host port (Hport
1
)
36
,
38
associated with controllers
40
,
41
and external hub
35
connects host server
32
to second host port (Hport
2
)
37
,
39
of controllers
40
,
41
. In like manner hubs
50
,
51
,
52
,
53
connect disc ports (Dport
1
, Dport
2
, Dport
3
, Dport
4
)
42
,
43
,
44
,
45
,
46
,
47
,
48
,
49
with disk drive loops
1
-
4
(Disk Loop
1
, Disk Loop
2
, Disk Loop
3
, Disk Loop
4
)
54
,
55
,
56
,
57
.
This configuration somewhat solves the FCL resiliency problem because conventional FC Hubs, such as hubs
34
,
35
,
50
-
53
, have typically been designed to connect multiple Loop agents together within a single Loop. In this configuration, each of the Hubs
34
,
35
,
50
-
53
should recognize the failure of any of its Loop agents (i.e. Hports
36
-
39
, Dports
42
-
49
, or Disk Loops
54
-
57
, or Host servers
31
,
32
) based on the loss of meaningful FC signal, then bypass the failed Loop agent while ensuring adequate FC signal strength and quality in order for the Loop to continue normal operations. Such normal operation should be guaranteed even if the FC is implemented with maximum standard FC cable length, copper or optical fibre.
In order to accomplish these requirements the FC Hub ports
36
-
39
should meet at least the following two criteria. First, each FC Hub port (i.e. hub ports
36
-
39
) should be able to intelligently discriminate between FC K28.5 characters on a FC clock frequency (within certain standard predetermined voltage levels) and random signal noise, to determine the proper operation and coherency of the Loop agent. Second, each the Hub (i.e. Hubs
34
-
35
and Hubs
50
-
53
)should be able to sink into (that is synchronize with) the Loop's FC signal clock phase and frequency, then re-drive the Loop's FC signal clock with adequate signal strength and quality.
In order to meet these two criteria, FC externals Hubs
34
-
35
,
50
-
53
must be of high quality, and being of high quality are by implication relatively expensive and bulky given the current state of the art in implementing such high-quality Hubs. Typically, each external hub would be implemented as an external enclosure typically measuring about 2″×12″×12″ and requires its own AC power cable connection for operation. For example, the INTRA LINK 1000 Hub made by VIXEL of California, USA, could be used for this application. However, even if the expense and bulk of such high-quality external Hubs could be tolerated, the system
30
would still be vulnerable to certain Loop agent failure scenarios. For example, if the agent is transmitting random FC signals but has failed logically, the HUB would keep the failed agent on the FC loop which might eventually bring it down due to the random incoherent transitions and brake the connection between the rest of the loop agents. In addition, if a power loss to any of the FC external Hubs
34
-
35
,
50
-
53
in the system
30
occurs, the connections of the Hub experiencing the power loss are immediately broken and the multi-hub system
30
will suffer from a single pointed failure. That is a single failure that is able to bring down the system, contrary to the intent of a dual-active system to protect against such single-point failure.
Therefore there is a need for a Fibre Channel Loop topology, structure, and method that provides the desired loop agent failure resiliency or redundancy without the expense of providing a separate external hub for each FC loop.
There is also a need for a Fibre Channel Loop topology, structure, and method that provides the desired loop agent failure resiliency or redundancy without the size and bulk associated with the plurality of external hubs.
There is also a need for a Fibre Channel Loop topology, structure, and method that will not experience a single point failure in the event of a power loss to some system components, that is there remains a need for resiliency and redundancy in the event of power failure.
SUMMARY OF THE INVENTION
The inventive structure and method provide communications loop resiliency loop failure in the event of a controller failure or problem, and is particulary useful in the environment of a dual-active ported fibre channel disk drive storage I/O server system. The failed controller is bypassed by a special circuit, refereed to as a loop resiliency circuit, which detects the failed controller on the basis of its output signal and selectively routes the signal received by the failed controller from the next downstream device connected on the loop to the output of the controller, so that the next upstream device can continue to operate on the Fibre Channel (or other) loop. Embodiments of the invention provide for dual-active ported devices, that is devices that are configured to communicate throus separate pairs of input/output ports, and typically coupled to two separate controllers. By maintaining the loop coherency in spite of the failed controller, the dual-ported discs remain accessible to the other controller and hence to the system as a whole.
In one embodiment, the invention provides a device bypass circuit (loop resiliency circuit) for a fibre channel communications loop which includes a first clock recovery circuit coupled to a first input port for receiving a first device signal from a first device and for generating a first regenerated device signal; a second clock recovery circuit coupled to a second
Cangialosi Salvatore
Flehr Hohbach Test Albritton & Herbert LLP
International Business Machines - Corporation
LandOfFree
Apparatus, system, and method for dual-active fibre channel... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus, system, and method for dual-active fibre channel..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus, system, and method for dual-active fibre channel... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2608369