Electrical computers and digital processing systems: processing – Processing architecture – Long instruction word
Reexamination Certificate
2000-10-02
2002-09-03
Coleman, Eric (Department: 2183)
Electrical computers and digital processing systems: processing
Processing architecture
Long instruction word
C712S021000
Reexamination Certificate
active
06446191
ABSTRACT:
FIELD OF THE INVENTION
For any Single Instruction Multiple Data stream (SIMD) machine with a given number of parallel processing elements, there will exist algorithms which cannot make efficient use of the available parallel processing elements, or, in other words, the available computing resources. Multiple Instruction Multiple Data stream (MIMD) class machines execute some of these algorithms with more efficiency but require additional hardware to support a separate instruction stream on each processor and lose performance due to communication latency with tightly coupled program implementations. The present invention addresses a better machine organization for execution of these algorithms that reduces hardware cost and complexity while maintaining the best characteristics of both SIMD and MIMD machines and minimizing communication latency. The present invention provides a level of MIMD computational autonomy to SIMD indirect Very Long Instruction Word (iVLIW) processing elements while maintaining the single thread of control used in the SIMD machine organization. Consequently, the term Synchronous-MIMD (SMIMD) is used to describe the invention.
BACKGROUND OF THE INVENTION
There are two primary parallel programming models, the SIMD and the MIMD models. In the SIMD model, there is a single program thread which controls multiple processing elements (PEs) in a synchronous lock-step mode. Each PE executes the same instruction but on different data. This is in contrast to the MIMD model where multiple program threads of control exist and any inter-processor operations must contend with the latency that occurs when communicating between the multiple processors due to requirements to synchronize the independent program threads prior to communicating. The problem with SIMD is that not all algorithms can make efficient use of the available parallelism existing in the processor. The amount of parallelism inherent in different algorithms varies leading to difficulties in efficiently implementing a wide variety of algorithms on SIMD machines. The problem with MIMD machines is the latency of communications between multiple processors leading to difficulties in efficiently synchronizing processors to cooperate on the processing of an algorithm. Typically, MIMD machines also incur a greater cost of implementation as compared to SIMD machines since each MIMD PE must have its own instruction sequencing mechanism which can amount to a significant amount of hardware. MIMD machines also have an inherently greater complexity of programming control required to manage the independent parallel processing elements. Consequently, levels of programming complexity and communication latency occur in a variety of contexts when parallel processing elements are employed. It will be highly advantageous to efficiently address such problems as discussed in greater detail below.
SUMMARY OF THE INVENTION
The present invention is preferably used in conjunction with the ManArray architecture various aspects of which are described in greater detail in U.S. patent application Ser. No. 08/885,310 filed Jun. 30, 1997, now U.S. Pat. No. 6,023,753, U.S. Ser. No. 08/949,122 filed Oct. 10, 1997, now U.S. Pat. No. 6,167,502, U.S. Ser. No. 09/169,255 filed Oct. 9, 1998, now U.S. Pat. No. 6,343,356, U.S. Ser. No. 09/169,256 filed Oct. 9, 1998 now U.S. Pat. No. 6,167,501 and U.S. Ser. No. 09/169,072 filed Oct. 9, 1998, now U.S. Pat. No. 6,219,776, Provisional Application Serial No. 60/067,511 entitled “Method and Apparatus for Dynamically Modifying Instructions in a Very Long Instruction Word Processor” filed Dec. 4, 1997, Provisional Application Serial No. 60/068,021 entitled “Methods and Apparatus for Scalable Instruction Set Architecture” filed Dec. 18, 1997, Provisional Application Serial No. 60/071,248 entitled “Methods and Apparatus to Dynamically Expand the Instruction Pipeline of a Very Long Instruction Word Processor” filed Jan. 12, 1998, Provisional Application Serial No. 60/072,915 entitled “Methods and Apparatus to Support Conditional Execution in a VLIW-Based Array Processor with Subword Execution” filed Jan. 28, 1988, Provisional Application Serial No. 60/077,766 entitled “Register File Indexing Methods and Apparatus for Providing Indirect Control of Register in a VLIW Processor”; filed Mar. 12, 1998, Provisional Application Serial No. 60/092,130 entitled “Methods and Apparatus for Instruction Addressing in Indirect VLIW Processors” filed on Jul. 9, 1998, Provisional Application Serial No. 60/103,712 entitled “Efficient Complex Multiplication and Fast Fourier Transform (FFT) Implementation on the ManArray” filed on Oct. 9, 1998, and Provisional Application Serial No. 60/106867 entitled “Methods and Apparatus for Improved Motion Estimation for Video Encoding” filed on Nov. 3, 1998, respectively, all of which are assigned to the assignee of the present invention and incorporated herein in their entirety.
A ManArray processor suitable for use in conjunction with ManArray indirect Very Long Instruction Words (iVLIWs) in accordance with the present invention may be implemented as an array processor that has a Sequence Processor (SP) acting as an array controller for a scalable array of Processing Elements (PEs) to provide an indirect Very Long Instruction Word architecture. Indirect Very Long Instruction Words (iVLIWs) in accordance with the present invention may be composed in an iVLIW Instruction Memory (VIM) by the SIMD array controller Sequence Processor or SP. Preferably, VIM exists in each Processing Element or PE and contains a plurality of iVLIWs. After an iVLIW is composed in VIM, another SP instruction, designated XV for “execute iVLIW” in the preferred embodiment, concurrently executes the iVLIW at an identical VIM address in all PEs. If all PE VIMs contain the same instructions, SIMD operation occurs. A one-to-one mapping exists between the XV instruction and the single identical iVLIW that exists in each PE.
To increase the efficiency of certain algorithms running on the ManArray, it is possible to operate indirectly on VLIW instructions stored in a VLIW memory with the indirect execution initiated by an execute VLIW (XV) instruction and with different VLIW instructions stored in the multiple PEs at the same VLIW memory address. When the SP instruction causes this set of iVLIWs to execute concurrently across all PEs, Synchronous MIMD or SMIMD operation occurs. A one-to-many mapping exists between the XV instruction and the multiple different iVLIWs that exist in each PE. No specialized synchronization mechanism is necessary since the multiple different iVLIW executions are instigated synchronously by the single controlling point SP with the issuance of the XV instruction. Due to the use of a Receive Model to govern communication between PEs and a ManArray network, the communication latency characteristic common to MIMD operations is avoided as discussed further below. Additionally, since there is only one synchronous locus of execution, additional MIMD hardware for separate program flow in each PE is not required. In this way, the machine is organized to support SMIMD operations at a reduced hardware cost while mininizing communication latency.
A ManArray indirect VLIW or iVLIW is preferably loaded under program control, although the alternatives of direct memory access (DMA) loading of the iVLIWs and implementing a section of VIM address space with ROM containing fixed iVLIWs are not precluded. To maintain a certain level of dynamic program flexibility, a portion of VIM, if not all of the VIM, will typically be of the random access type of memory. To load the random access type of VIM, a delimiter instruction, LV for Load iVLIW, specifies that a certain number of instructions that follow the delimiter are to be loaded into the VIM rather than executed. For SIMD operation, each PE gets the same instructions for each VIM address. To set up for SMIMD operation it is necessary to load different instructions at the same VIM address in each PE.
In the presently preferred embodiment, this is achieved by a m
Drabenstott Thomas L.
Morris Grayson
Pechanek Gerald G.
Revilla Juan Guillermo
Strube David Carl
BOPS, Inc.
Coleman Eric
Priest & Goldstein PLLC
LandOfFree
Methods and apparatus for efficient synchronous MIMD... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for efficient synchronous MIMD..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for efficient synchronous MIMD... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2884502