Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1999-02-05
2001-07-31
Kim, Kenneth S. (Department: 2783)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S245000, C712S246000, C709S241000, C717S152000
Reexamination Certificate
active
06269440
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to data processing, and more particularly to a method and apparatus for increasing the speed of processing data vectors in a digital signal processor or microprocessor without requiring vector registers or a large number of registers.
2. Description of the Related Art
A data vector is a series of data elements. The concept of vector processing has been incorporated into computing systems to provide high computational throughput for many applications by performing the same series of operations on each data element or pairs of data elements.
Typically, the vector processing loops required to perform the same series of operations on each data element or pairs of data elements dominate the amount of time required to process signal processing kernels. The time required to perform these vector processing loops has been decreased in a number of ways utilizing both hardware and software. For example, software techniques include unrolling the loops, using parallel issue including reordering the instructions, and software pipelining. In hardware, zero overhead looping, parallel execution (both superscalar and instruction indicated), post-address-modifying loads and stores, vector units and vector registers, and Very Long Instruction Word (VLIW) instructions that do several of the required operations in parallel have been implemented. Although these methods increase the speed of the vector processing, they either require extra code, make the required assembly code hard to read and understand, or require extra registers that are not used except for these vector operations.
One approach to exploiting the kind of parallelism inherent in vector processing is through the use of dynamic scheduling. Several dynamic scheduling techniques are known in the art, including superscalar, scoreboarding, and reservation stations. Reservation stations, in particular, address the problem of executing multiple iterations of a loop without changing the source code. Reservation stations work by eliminating false dependencies between the instructions of different loop iterations. When the instructions of a particular iteration are executed by a sequential issue machine, dependencies between the instructions within the iteration may block issuing of instructions in the next iteration, even though there are sufficient hardware resources and no dependencies between the current iteration and the next. Reservation stations allow an instruction to be issued and buffered at a functional unit for later execution. This frees the issue pipeline to process additional instructions and begin the next iteration before the current one is finished. Reservations stations, however, require additional hardware, are extremely complex, and make the execution time of the loop non-deterministic.
Thus, there exists a need for an apparatus and method that increases the speed of processing of data vectors by processing multiple data elements at the same time which is programmed with readable assembly language and does not require a lot of extra registers.
SUMMARY OF THE INVENTION
The present invention overcomes the problems associated with the prior art and provides an apparatus and method that speeds the processing of data vectors using a zero overhead loop with parallel issue and post-address-modifying loads and stores that processes multiple data elements at the same time, and yet is programmed with readable assembly language and does not require a lot of extra registers.
In accordance with the present invention, the loop instructions are formed as producer-consumer instructions, i.e., the result of instruction M is used only by instruction M or M+1, and the results are stored into different registers. A compiler or assembler detects the producer-consumer loops, reassigns registers to meet the different result criteria, and encodes the zero overhead loop as a vector zero overhead (vdo) loop. Since the loop analysis is done in software, there is no additional hardware required to detect it. Also, since general purpose registers are used, there is no need for vector registers. Furthermore, since only register assignments and the zero overhead loop instruction are changed to a vector zero overhead loop instruction, the readability of the assembly code is maintained.
These and other advantages and features of the invention will become apparent from the following detailed description of the invention which is provided with the accompanying drawings.
REFERENCES:
patent: 5317743 (1994-05-01), Imai et al.
patent: 5481723 (1996-01-01), Harris et al.
patent: 5537606 (1996-07-01), Byrne
patent: 5586320 (1996-12-01), Hotta et al.
patent: 5758176 (1998-05-01), Agarwal et al.
patent: 5805875 (1998-09-01), Asanovic
patent: 5832290 (1998-11-01), Gostin et al.
patent: 5872989 (1999-02-01), Tsushima et al.
Fernando John S.
Lemmon Frank T.
Whalen Shaun P.
Agere Systems Guardian Corp.
Dickstein , Shapiro, Morin & Oshinsky, LLP
Kim Kenneth S.
LandOfFree
Accelerating vector processing using plural sequencers to... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Accelerating vector processing using plural sequencers to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Accelerating vector processing using plural sequencers to... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2461403