Method for reducing tuning etch in a clock-forwarded interface

Electrical computers and digital processing systems: support – Synchronization of clock or timing signals – data – or pulses – Using delay

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C713S503000, C713S600000, C438S689000

Reexamination Certificate

active

06754838

ABSTRACT:

CROSS-REFERENCE TO RELATED APPLICATIONS
Not applicable.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not applicable.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to a computer system comprising a plurality of pipelined, superscalar microprocessors. More particularly, the invention relates to communication of data between a core logic chipset and multiple processors. More particularly still, the invention relates to the recovery of data transmitted along different point-to-point data paths between components in the chipset and the processors.
2. Background of the Invention
It often is desirable to include multiple processors in a single computer system. This is especially true for computationally intensive applications and applications that otherwise can benefit from having more than one processor simultaneously performing various tasks. It is not for a multi-processor system to have 2 or 4 or more processors working in concert with one another. Typically, each processor couples to at least one and perhaps three or four other processors.
Such systems usually require data and commands (e.g., read requests, write requests, etc.) to be transmitted from one processor to another. As processor and bandwidth capabilities increase, the size of the data and command packets also increase. In transmitting this information between processors, it may be desirable to deliver these data packets in contiguous form. That is, the data is preferably transmitted along parallel data traces between respective processors. To accomplish this, signal paths between the processors must exist for each bit of information in a packet. A 32-bit long packet therefore would require 32 separate signal paths between processors.
Many modern multi-processor systems rely on a core logic chipset to literally direct data traffic between processors and the outside world. A conventional core logic chipset includes, among other things, a memory controller and I/O interface circuitry. Older chipsets would also control cache memory, but newer designs are delegating this role to the processors to which the cache memories are connected. To improve bandwidth and reduce latency, chipsets are being designed with point-to-point, switched data transfer architectures rather than shared bus architectures. The switched architecture allows direct connection between two devices and aids performance by allowing for higher clock rates and also permits scalable bandwidth.
To take advantage of the direct, point-to-point connections between devices in the chipset, a clock forwarding technique is commonly used. In this technique (sometimes referred to as a source synchronous technique), timing signals are sent in parallel with data signals. This is compared to the method where the destination device samples the incoming data using a clock internal to the destination device and that is asynchronous to the incoming data (i.e., rising and falling edges do not align with respect to time). In the clock-forwarding scheme, the clock and data are fully synchronized, which permits more efficient data extraction by the destination device.
Clock forwarding transmission schemes work by sampling the incoming data at the receiving device using the corresponding forwarded clock signal. The receiving device commonly employs a latch or series of latches (flip-flops) to sample the data. The latches are triggered using the forwarded clock such that the data is pulled into the receiving device at the appropriate rising or falling edge of the forwarded clock signal.
The data sampling latches used in this transmission scheme require that the data be present at the input to the latch for a minimum amount of time before and after the latch is triggered by a forwarded clock edge. This is referred to as the setup and hold time requirements for a latch. The setup and hold requirements, if met, guarantee that the data is sampled reliably. If this setup and hold time is violated the sampled signal becomes unstable and unreliable. In actual implementations of the clock-forwarding scheme, the forwarded clock must be delayed slightly to guarantee that the data arrives at the sampling latches before the corresponding clock edge arrives. This timing adjustment is referred to as clock tuning. Clock tuning is typically implemented by adding etch to the clock signal trace. If enough tuning etch is added, the setup and hold requirements of the sampling latches can be met and the data can be reliably extracted by the receiving device.
The process of tuning a forwarded clock is iterative and can be cumbersome. Theoretical values for the required length of tuning etch are determined before hand based on the length of the data paths. Computer aided design (CAD) designers can lay in additional etch to the forwarded clock traces, but tests must be run on actual hardware to determine if more or less tuning etch is needed for a given data group. The designs are then altered in the CAD database and the process is repeated. The tuning process is therefore time consuming, tedious, and error-prone.
Modern core logic chipsets include a number of devices, each capable of transmitting data to and from a processor. For example, the Compaq 21264 Alpha processor has employed a core logic chipset that includes ASIC chips capable of transmitting 64-bit data bundles to four separate processors. Transmitting 64 bits of data in parallel can monopolize a large amount of real estate on a system board or motherboard. In many cases, the data bundles are separated into sub-bundles to allow for more efficient use of board space. In such cases, each sub-bundle is transmitted with its own forwarded clock to guarantee reliable data transmission.
One drawback to separating the data bundles into sub-bundles is that each forwarded clock must be tuned individually. Since the routing path for each sub-bundle will invariably be unique, the amount of tuning etch needed for each forwarded clock will be different. In a multi-processor system, this problem quickly grows into an enormous task. If we assume a 64-bit data bundle is separated into 8 sub-bundles and our system has four processors, we quickly find that there are 32 separate forwarded clock traces that must be individually tuned. This number is effectively doubled if you consider the tuning required for the forwarded clocks associated with data transmission in the opposite direction. Not only does this tuning require a large amount of board real estate, but also time and money.
Another consideration in a clock-forwarding transmission scheme relates to skew problems. Since board area is needed to allow for etch tuning of the forwarded clocks, the most direct data path from source to destination is not always used. This results in skew between the data sub-bundles. That is, the data sub-bundles arrive at their destination at different times. This creates latency delays due to the additional time required for the receiving device to reconstruct the original data bundle from the sub-bundles.
It is desirable therefore, to develop a data transmission scheme that successfully eliminates the quantity of tuning etch required to reliably sample data at a receiving device. The transmission scheme preferably offers reliable data transfer between devices while minimizing latency and skew and maximizing bandwidth. The transmission scheme may also indirectly improve the manufacturability of printed wiring boards and processor hardware by easing the requirements for parallel, equal-length data paths. Design times may also be advantageously reduced by eliminating much of the iterative process required in tuning forwarded clock paths.
BRIEF SUMMARY OF THE INVENTION
The problems noted above are solved in large part by a clock forwarding scheme for use in a system comprising a plurality of communications links, each link configured to transmit data packets from a transmitting device to a receiving device. Each communications link includes a conduction path for each data bit in the data packet and at least one

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for reducing tuning etch in a clock-forwarded interface does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for reducing tuning etch in a clock-forwarded interface, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for reducing tuning etch in a clock-forwarded interface will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3296496

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.