Electrical computers and digital processing systems: multicomput – Distributed data processing
Reexamination Certificate
1998-09-25
2001-05-22
Harrell, Robert B. (Department: 2152)
Electrical computers and digital processing systems: multicomput
Distributed data processing
Reexamination Certificate
active
06237021
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to computer architectures, and more particularly to a method and apparatus for providing a sustained, peak performance computing architecture having concurrent memory controllers and parallel execution circuitry integrally coupled via a comprehensive pipelined control structure capable of facilitating single-cycle execution of an algorithm.
BACKGROUND OF THE INVENTION
Computer architecture generally refers to a system designer's and programmer's view of a computer, which includes parameters such as memory, instruction sets, programmable registers, interfacing signals, and other aspects relating to the internal operation of computers. The processing power driving today's computers includes large ASICs designed for mainframe computers, as well as microprocessor and microcontroller devices housed in desktop PCs.
Technological computer architecture advances have typically evolved from the recognition of computing shortcomings facing the technology of the day. Where a new architecture may have solved a problem, it often created a new one. For example, memory caching, instruction pipelining, and reduced instruction set computers have all emerged to relieve a computing bottleneck of one form or another. Advances in other technologies, such as networking and telecommunications, have also inspired changes in computer architectures, while new design, fabrication and manufacturing techniques have permitted architectural improvements. At times, computer architecture progress forges straight ahead, yet at times is diverted off course. Some of the problems facing even the most current technologies stem from the commercial need to provide comprehensive and complex computing systems capable of operating over a broad range of applications. However, this reality can have a detrimental effect on processing performance for more specific applications.
For example, the Complex Instruction Set Computer (CISC) for years dominated the architectural race. CISC architectures were driven by the prevailing view that a large instruction set was desirable. The rationale behind this view was that by adding new, more specialized instructions, program execution would be accelerated due to a reduced number of instruction fetches. While this was true, other factors were adversely affecting program execution performance, including the inherent complexity associated with CISC processors that reduces the ability to speed up the Central Processing Units (CPU). Furthermore, many programs executed by these CPUs are produced by compilation which imposes a certain pattern on the utilization of the instruction set. Other factors also contributed to the realization that a better way of accomplishing greater processing speeds was needed.
Computer architecture then took a turn in an attempt to increase computing speed and performance, and Reduced Instruction Set Computers (RISC) were born. RISC processors are equipped with a restricted number of instructions and addressing modes, and the spared CPU logic is used for additional internal registers. While RISC processing certainly helped processing speeds, the technical limitations of memory was holding the technology down, as memory could not maintain the supply of data and instructions. Further, as the speed increased, it became more difficult to supply a fill 32-bit word from memory in a single cycle, since RISC processors require more instructions to perform the same job that what was required by a CISC processor. Additionally, the fixed instruction format of RISC processors resulted in RISC code using more memory. These problems were in part addressed by the high speed cache, and in some designs multiple caches, such as an instruction cache and a data cache. Again, these solutions raised new issues, such as cache coherency issues.
However, there are applications that are so data-intensive that use of a CISC, or even a RISC for that matter, is extremely inefficient. For applications where very large volumes of data must be processed quickly, these general architectures simply have too much overhead. The use of programs and program memory, program counters, memory fetching, address decode, bus multiplexing, branch logic, and the like are advantageous in some applications, but inherently result in undesirable overhead for certain other computing needs. Consider, for example, a recent seismic processing task in the oil industry. The task involved taking 30 gigabytes of input data, subjecting it to over 240 teraoperations on a supercomputer, and producing approximately 194 megabytes of output data. This task took approximately 2 months of CPU time on a state-of-the-art 24-processor machine. The current invention would cut this time to approximately 12 days if implemented using FPGA technology, and to approximately 4 days using ASIC technology.
Furthermore, in order to obtain even these lengthy processing turn-around times, it requires state-of-the-art computing power operating at the highest available clock rates, which means high equipment costs. The present invention, on the other hand, can be used with lower-performance host computers and still provide a substantial overall increase in processing speed. The host computers used in connection with the present invention can be “commodity” components, resulting in lower host computer costs.
Therefore, it would be desirable to provide a processing architecture having cutting edge computing speeds for use in data-intensive applications. Accordingly, the present invention provides a computer architecture capable of sustaining peak performance by exploiting the parallelism in algorithms and eliminating the latencies involved in sequential machines. The present invention provides a solution to the aforementioned and other shortcomings of the prior art, and offers additional advantages and benefits over existing computer architecture technologies.
SUMMARY OF THE INVENTION
The present invention is directed to a system and method for providing a sustained, peak performance computing architecture.
In accordance with one embodiment of the invention, a hardware processing architecture for performing repeated algorithm iterations is provided, wherein each of the algorithm iterations is performed on a parallel set of algorithm input data. The architecture includes a memory arranged to store the algorithm input data in parallel, contiguous bit locations. A parallel execution module having a plurality of functional execution units is provided, wherein each of the functional execution units is configured to perform a preassigned function dictated by the algorithm on predetermined bits of each iterative parallel set of algorithm input data. A data flow module is coupled to the memory and to the parallel execution module, and is configured to replicate in hardware the control constructs and expression evaluations of the algorithm, and to distribute the input data to the plurality of function execution units in accordance with the control constructs and expression evaluations of the algorithm.
In more specific embodiments, the data flow module comprises a pipelining structure facilitating algorithm outputs on each clock cycle. The pipeline structure synchronizes the arrival of input data at each of the functional execution units that would otherwise not receive the appropriate inputs at the functional execution units at the correct time. The pipeline structure also includes an overlaying pipeline structure that pipelines each of the different functional execution units, the control structures and the expression evaluation structures in the algorithm (i.e., all of the functional and control circuitry representing the algorithm) to facilitate outputting of an algorithm in each clock cycle.
In accordance with another embodiment of the invention, a processing system for carrying out data-intensive computing applications is provided. The system includes at least one data server capable of outputting stored test data, and one or more host computing devices coupled to receive
Complex Data Technologies, Inc.
Harrell Robert B.
Schwegman Lundberg Woessner & Kluth P.A.
LandOfFree
Method and apparatus for the efficient processing of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for the efficient processing of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for the efficient processing of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2455314