System for recovering data in a multiprocessor system...

Electrical computers and digital processing systems: support – Clock control of data processing system – component – or data...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S237000, C716S030000

Reexamination Certificate

active

06668335

ABSTRACT:

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not applicable.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to a computer system comprising a plurality of pipelined, superscalar microprocessors. More particularly, the invention relates to communication of data between multiple processors. More particularly still, the invention relates to the recovery of data transmitted in an asynchronous clock domain along different point to point data paths between processors.
2. Background of the Invention
It often is desirable to include multiple processors in a single computer system. This is especially true for computationally intensive applications and applications that otherwise can benefit from having more than one processor simultaneously performing various tasks. It is not uncommon for a multi-processor system to have 2 or 4 or more processors working in concert with one another. Typically, each processor couples to at least one and perhaps three or four other processors.
Such systems usually require data and commands (e.g., read requests, write requests, etc.) to be transmitted from one processor to another. As processor and bandwidth capabilities increase, the size of the data and command packets also increase. In transmitting this information between processors, it may be desirable to deliver these data packets in contiguous form. That is, the data is preferably transmitted in parallel between respective processors. To accomplish this, signal paths between the processors must exist for each bit of information in a packet. A 32-bit long packet therefore would require 32 separate signal paths between processors.
Routing of multiple, parallel signal paths is difficult in congested printed wiring board configurations. As more components are added to circuit boards, little room is left for signal traces, especially multiple traces that are preferably parallel and of equal length. These routing difficulties exist even in multi-layer board designs. It may be difficult to guarantee that individual bits in a data packet sent at the same time from one processor will arrive at their destination at the same time because signal trace lengths are rarely equal in length. In an extreme case, it may be desirable to intentionally divide the signal paths for a single packet into multiple branches since routing of smaller sub-branches may be easier than routing all the signal paths together. For instance, the 32-bit packet discussed above may be split into two 16-bit packets. Splitting data in this manner makes trace routing less troublesome, but raises issues of signal integrity because the separated signals must be recombined at the destination to form the original data packets. One way to help ensure the data is captured correctly is to send a clock signal with each branch of the data packet. The clock signals may be used to locate data transitions and to account for differences in path lengths between the branches of the data packet. The clock signal may be sampled at the receiver to locate clock edges and correctly extract the data. A mechanism must be created to receive data from these two 16-bit branches and recombine the data into its original, 32-bit form.
The above problem is exacerbated if the data is transmitted at a clock frequency that is different from the processor's internal clock frequency. The receiver must not only recombine the data that has been split among different transmission paths, it must also read and hold the data until the processor is ready to pull the data into the internal clock domain. A number of problems may arise in accomplishing these steps. First, there is no guarantee data that was aligned as it left the transmitting processor is aligned when it arrives at the receiving processor. Second, if a clock signal is sent with each transmission path, there is no guarantee that the receiving processor will obtain the same result from sampling the separate clocks. For example, in the example given above where the 32-bit packet is divided into two separate paths, the clock signals from each 16-bit group may be sampled at exactly the same time, but because of skew, different results may be obtained. Even if one could guarantee that the data in the two separate branches of the packet arrive at exactly the same time, the clock signal for one branch may be sampled before a clock edge while the other may be sampled after a clock edge. The end result may be incorrectly combined data. Thirdly, because of the asynchronous nature of the transmitted signals, it is highly likely that in waiting to pull captured data into the processor's clock domain, the captured data may be overwritten by incoming data. While buffers may be used to solve these timing and skew problems, unwanted latency delays may be induced.
It is desirable therefore, to develop a data capture scheme that successfully reconstructs and re-synchronizes data at a receiving processor. The capture scheme preferably offers reliable data transfer between processors while minimizing latency and maximizing bandwidth. The capture scheme may also indirectly improve the manufacturability of printed wiring boards and processor hardware by easing the requirements for parallel, equal-length data paths.
BRIEF SUMMARY OF THE INVENTION
The problems noted above are solved in large part by an input data recovery scheme that may be implemented in a multiprocessor system comprising a communications link configured to transmit data packets from a transmitting processor to a receiving processor. The communications link includes a conduction path for each data bit in the data packet. The conduction paths are grouped into separate bundles and routed along different paths and a forwarded clock signal is sent with each bundle. The forwarded clock signal is transmitted on a differential pair of conduction paths. At the receiving processor, the data in the separate bundles is recombined to recreate the original data packet. The processors operate with a clock frequency that is at least three times as fast as the clock frequency of the forwarded clock signal and data is transmitted on both rising and falling edges of the forwarded clock signal.
The receiving processor contains a recovery circuit which samples the forwarded clock signals to locate corresponding clock edges in the separate forwarded clock signals to indicate when the data on the conduction paths may pulled into the processor clock domain. The recovery circuit includes a delay locked loop (“DLL”) circuit, a sampling circuit, a finite state machine, and data capture logic. A DLL circuit is coupled to each forwarded clock signal to create a delayed copy of the forwarded clock signal. The clock signal is delayed so that the clock edges in the delayed clock signal are aligned with the center of the data window for data transmitted with the forwarded clock signal.
The recovery circuit also includes a sampling circuit configured to sample the delayed clock signal at the processor clock frequency to locate rising and falling edges in the delayed clock signal. The sampling circuit comprises a chain of flip-flops configured to sample the delayed clock signal and generate a string of sequential samples of the clock signal. The sampling circuit also includes a bank of logic gates configured to set a bit at the output of one of the logic gates indicating that an edge transition occurs between any two of the three sequential samples. Shift registers are coupled to each logic gate and are configured to shift the output of the associated logic gate at every processor clock cycle. A multiplexer is coupled to each shift register and is configured to extract data from a bit location in the shift register as specified by a clock ratio input. This clock ratio is based on the ratio of the transmission and processor clock frequencies and also on the length of the flip-flop chain through which the clock signals are sampled. This information is used to take advantage of the periodic nature of the forwarded clock signal and allo

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for recovering data in a multiprocessor system... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for recovering data in a multiprocessor system..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for recovering data in a multiprocessor system... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3179779

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.