Error detection/correction and fault detection/recovery – Pulse or data error handling – Digital data error correction
Reexamination Certificate
2000-07-12
2003-07-29
Tu, Christine T. (Department: 2133)
Error detection/correction and fault detection/recovery
Pulse or data error handling
Digital data error correction
C714S799000
Reexamination Certificate
active
06601210
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to systems and methods for high-speed data transfer, and specifically to methods and devices for verifying the integrity of data and detecting data faults in such a system.
BACKGROUND OF THE INVENTION
Error-checking is an integral element of high-reliability data transfer systems. To ensure reliable operation, it is necessary to check all data and control paths for bit errors, which may occur due to noise, cosmic rays or other causes. One of the most common, standard methods for error checking is based on Cyclic Redundancy Codes (CRCs), which are computed by applying a predefined polynomial to each block of transmitted data. Typically, before sending a data packet over a network or other communication link, the transmitting device, such as a source node, computes the CRC for the bits in the packet and appends it to the packet. Upon receiving the packet, the receiving device recomputes the CRC and compares it to the transmitted CRC. A discrepancy indicates that an error has occurred.
In switch fabrics and other complex networks, it is generally desirable not only to detect erroneous packets, but also to determine at what point the error in the packet was introduced, in order to take remedial action. Since a packet typically makes multiple hops through the network between its source and its destination, multiple CRC checks are required. It is therefore necessary to check the CRC at every input of every device in the network. An approach of this sort has been adopted in the “InfiniBand” network architecture, which has been advanced by a consortium led by a group of industry leaders (including Intel, Sun, Hewlett Packard, IBM, Compaq, Dell and Microsoft). This approach does not provide a complete solution to the problem of error location, however, since the CRC check at the device input is incapable of distinguishing between network link errors and device errors that occurred in devices along the packet's path.
FIG. 1
is a block diagram that schematically illustrates a switching device
20
, as is known in the art, offering a solution to the problem of error location. Device
20
comprises a plurality of input circuits
22
and output circuits
26
, interconnected by a switching core
24
. Each of the input circuits comprises a CRC checker
30
, typically a logic circuit, which computes the CRC of the bits in each received packet and compares it to the CRC transmitted with the packet by the preceding device on the packet's route. In order to distinguish internal errors from external errors, a parity generator
32
computes a parity bit and attaches it to each unit of data to be conveyed through the device. The data, together with the associated parity bits, are then stored in a buffer
34
while awaiting the attention of switching core
24
. Output circuits
26
may include additional or alternative data buffers, as is known in the art.
A parity checker
40
recomputes the parity of each data unit passed to output circuit
26
by switching core
24
. A discrepancy in the parity bit indicates that an error has occurred in data storage or transfer within device
20
. Routing logic
42
modifies the header of the data packet, typically in order to indicate the source and destination that the packet is to take over its next hop. A CRC calculator
44
computes a new CRC for the packet, reflecting the change in the packet header, and the packet is then transmitted to the next device over the network. The separation of CRC and parity functions enables device
20
to distinguish internal from external data errors. Addition of the parity bit, however, requires additional data lines and storage capacity inside the device. When the data unit size is one byte, this overhead is greater than 10%.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide improved devices and methods for verifying data integrity in a data transmission system.
It is a further object of some aspects of the present invention to provide devices and methods for distinguishing between internal device errors and external link errors, with reduced overhead relative to approaches known in the art.
In preferred embodiments of the present invention, a switching device in a network comprises CRC checking logic both in its input circuits and in its output circuits. For each block of data passing through the device, typically in the form of a data packet, the CRC is thus checked twice: once as it enters the device and once before it exits. A discrepancy between the entry and exit CRCs indicates that a fault has occurred inside the device. Otherwise, when both the entry and exit CRCs are erroneous, the fault can be identified as having occurred in a link or other device before the packet reached the current device. Internal and external faults are thus distinguished without the added overhead of using a parity bit.
Preferably, when a CRC error is detected in a packet, the packet header information is reported to a network management entity, referred to herein as a fabric manager, together with an identification of the fault as internal or external. Alternatively, only internal errors, only external errors, or neither internal nor external errors are reported in this manner. The fabric manager uses the header information to identify the route that the packet has taken through the network, so as to diagnose the fault and to take corrective action as appropriate.
While preferred embodiments are described herein with reference to particular switching devices in packet-switched networks, the principles of the present invention are equally applicable to other types of data transmission devices and systems. Furthermore, although these embodiments are based on CRCs, other methods of error checking, as are known in the art, may similarly be used.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a data transmission device, including:
input circuitry, configured to receive a block of data over a network, the block containing an error-checking code, the input circuitry including input error-checking logic, which is adapted to detect a discrepancy between the data and the error-checking code and to generate a first error signal indicating whether the discrepancy was detected in the input circuitry;
output circuitry, configured to transmit the block of data, received by the input circuitry, over the network, and including output error-checking logic, which is adapted to detect the discrepancy between the data and the error-checking code and to generate a second error signal indicating whether the discrepancy was detected in the output circuitry; and
a comparator, coupled to receive and compare the first and second error signals, so as to identify a source of the discrepancy.
Preferably, the error-checking code includes a Cyclic Redundancy Code (CRC).
Further preferably, when the error signals indicate that the discrepancy was detected in the output circuitry but not the input circuitry, the comparator identifies the source of the discrepancy as being in the device. Most preferably, when the error signals indicate that the discrepancy was detected in both the input circuitry and the output circuitry, the comparator identifies the source of the discrepancy as being external to the device.
Preferably, the data block includes a packet including routing information, and the device includes a buffer, which is coupled to receive the routing information and to transfer the routing information to a processor responsive to one or more of the error signals.
In a preferred embodiment, the input and output circuitry respectively include multiple input and output ports, and the input and output error-checking logic include error checkers associated respectively with the multiple ports, and the device includes a switching core, coupled to transfer the block of data from one of the input ports that receives the block to a designated one of the output ports.
There is also provided, in accordance with a preferred embodi
Mellanox Technologies, LTD
Tu Christine T.
Woodcock & Washburn LLP
LandOfFree
Data integrity verification in a switching network does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data integrity verification in a switching network, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data integrity verification in a switching network will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3012975