Basic cell for N-dimensional self-healing arrays

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S003000, C714S012000

Reexamination Certificate

active

06789212

ABSTRACT:

BACKGROUND
1. Field of Invention
The present invention relates generally to repairable processor arrays and more particularly to an automatically repairable chain of processors.
2. Description of Related Art
Multi-processor arrays may possess millions, possibly even billions of transistors. With such huge numbers the likelihood of individual transistors failing may be non-negligible.
Clearly it is not acceptable to replace the entire array of processors, and, in many cases, it is not feasible even to replace an individual processor or element of the system, particularly if the processor that fails is part of an array of many processors implemented on a single substrate. Therefore, a means of detecting failure and taking some corrective action becomes increasingly important.
Multi-processor arrays have been built since the 1970's, generally with large numbers of very simple processors. Today technology offers many ways to implement many processors on a chip. It is expensive to manually replace single processor chips, particularly chips with hundreds of pins, and particularly ball grid array (BGA) surface-mounted packaging. Thus extreme efforts are made to detect any failure during manufacturing and qualification. For example, ‘full-scan’ testing procedures build test circuitry into every register, almost the most expensive approach conceivable. Such circuitry allows every register to be tested for ‘stuck’ faults, those occurring when a normally two-state system insists on remaining stuck at one state. The tests are performed at various stages of manufacture, typically near packaging of the chips. The farther along in manufacturing, the more expensive things become, so every effort is made to delete failures as early as possible. By the time the system is deployed, failures have reached their maximum cost level. Manually repairing or replacing such components is prohibitively expensive.
SUMMARY OF THE INVENTION
A method of testing for faults, excising such faults, and re-connecting an otherwise broken array of cells, is described with examples presented in terms of the basic cell architecture supporting cell excision and net healing. A cell replacement mechanism is developed, and the limiting cases of 100% and 0% replacement are considered along with associated costs. Thus the system allows either replacement of bad cells or bypassing of bad cells, with appropriate cost and operational differences. Both level sensitive and edge sensitive excision mechanisms are described and the consequences of each discussed. The invention applies to processor arrays with one cell per physical chip or many cells per chip, and handles uni-directional or bi-directional data flows. Limiting cases of all uni-directional busses and all bi-directional busses are treated. The invention is generally both interface independent and technology independent. The extension to N-dimensional is developed with the case of 2-space diagrammatically presented.
An object of the invention is to repair a chain of processing elements, as shown in
FIG. 1
, without having to manually reconfigure the chain. This is traded off against the minimal cost of the associated circuitry described herein, and the software recovery procedures necessary to synchronize operation of the healed chain. It may also be traded off against ‘full scan’ tests.
An apparatus in accordance with the present invention includes (i) a processing, cell having an upstream interface and a downstream interface, where the processing cell performs processing operations of the extended processor element; and (ii) bypass circuitry, for bypassing the processing cell, connected to at least one select line to receive a select signal and connected between the extended upstream interface and said cell upstream interface and between the extended downstream interface and cell downstream interface, where said bypass circuitry is operative to connect the extended upstream interface to the extended downstream interface in response to an active select signal, and to connect said cell upstream interface to the extended upstream interface and said cell downstream interface to the extended downstream interface in response to an inactive select signal.
In a chain of extended processing elements, where each element has a processing unit for carrying out the processing operations of the extended processing element, and bypass circuitry to connect the processing unit to an extended upstream interface and an extended downstream interface of the extended processing element when the bypass circuitry is not activated, and to connect the extended upstream interface to the extended downstream interface when the bypass circuitry is activated, the chain being formed by connecting upstream and downstream interfaces to each other, a method, in accordance with the present invention, includes (i) receiving information indicating that testing is required of the upstream processor element; testing the upstream processor element to determine whether said cell of the upstream element responds correctly; and activating the upstream processor element bypass circuitry to connect the upstream interface to the downstream interface of the upstream processor element if the upstream processor element does not respond correctly.
An advantage of the present invention is that a physical processor chain can be “healed” without having to manually excise the failed cell and manually repair the break. This advantage leads to savings in time and cost of manually repairing a broken chain.
Another advantage is that a chain can be reconfigured by excising some elements and restoring other elements as needed for a particular processing task. This advantage leads to a savings in power if unneeded elements are powered down.


REFERENCES:
patent: 4533993 (1985-08-01), McCanny et al.
patent: 4604683 (1986-08-01), Russ et al.
patent: 5151996 (1992-09-01), Hillis
patent: 5203005 (1993-04-01), Horst
patent: 5291609 (1994-03-01), Herz
patent: 5394544 (1995-02-01), Motoyama et al.
patent: 5497373 (1996-03-01), Hulen et al.
patent: 5530894 (1996-06-01), Farrell et al.
patent: 5537654 (1996-07-01), Bedingfield et al.
patent: 5568621 (1996-10-01), Wooten
patent: 5574861 (1996-11-01), Lorvig et al.
patent: 5621900 (1997-04-01), Lane et al.
patent: 5663950 (1997-09-01), Lee et al.
patent: 5682328 (1997-10-01), Roeber et al.
patent: 5682552 (1997-10-01), Kuboki et al.
patent: 5687346 (1997-11-01), Shinohara
patent: 5710932 (1998-01-01), Hamanaka et al.
patent: 5715411 (1998-02-01), Verdun
patent: 5754792 (1998-05-01), Shutoh et al.
patent: 5790810 (1998-08-01), Kaba
patent: 5790879 (1998-08-01), Wu
patent: 5801715 (1998-09-01), Norman
patent: 5802325 (1998-09-01), Le Roux
patent: 5815723 (1998-09-01), Wilkinson et al.
patent: 5822548 (1998-10-01), Story et al.
patent: 5822608 (1998-10-01), Dieffenderfer et al.
patent: 5832245 (1998-11-01), Gulick
patent: 5857084 (1999-01-01), Klein
patent: 5935223 (1999-08-01), Griffith et al.
patent: 6035354 (2000-03-01), Klein

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Basic cell for N-dimensional self-healing arrays does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Basic cell for N-dimensional self-healing arrays, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Basic cell for N-dimensional self-healing arrays will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3192868

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.