Electrical computers and digital processing systems: processing – Processing architecture – Vector processor
Reexamination Certificate
1998-06-23
2001-10-23
Chan, Eddie (Department: 2783)
Electrical computers and digital processing systems: processing
Processing architecture
Vector processor
C712S006000, C712S009000, C712S017000, C712S019000, C712S022000, C709S241000, C709S241000
Reexamination Certificate
active
06308250
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to the field of high-performance computers, and more particularly to a method and system for operating a computer having multiple processing units.
BACKGROUND
In order to accelerate the processing of data, many high-performance computing systems are controlled by a single-instruction stream yet operate on multiple-data streams (SIMD). One such computing system is a vector computer designed to process in parallel each element of a linear array of numbers. For example, a vector machine adds two vectors by adding in parallel each entry of the two vectors to produce a single vector of the same length. A single vector instruction is therefore equivalent to a programming loop iterating from 1 to the number of elements in the vector.
Wherever possible, a specialized compiler replaces programming loops with vector instructions. Whether a compiler can vectorize a loop often depends on whether each element in a result vector can be computed independently from the other elements. When the results of one iteration of the loop depend on results of previous iterations, a data hazard exists and the loop typically cannot be “vectorized”. The extent that a program can be vectorized, however, not only depends on the existence of data hazards but on the ability of the compiler to restructure the program in order to reduce or eliminate the data dependencies. Thus, there is tremendous variation the software routines that conventional compilers can vectorize.
One class of software that conventional compilers have difficulty vectorizing is inductive loops having conditional statements. Typical examples of this class of software includes vector compression and vector expansion. In vector compression, elements of a source vector are copied to a destination vector based on some condition such as a mask bit. Thus, only a subset of the elements of the source vector are copied to the destination vector and are stored in the lower order elements of the destination vector. Because the destination of a particular source element depends on how many elements need to be copied, a data hazard exists that prevents the compression from easily being vectorized. Similar problems exist for vector expansion.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for a method and system for operating a computer having multiple processing unit that easily vectorize inductive loops having conditional statements. There is also a need for an optimizing compiler which can generate machine instructions for such a computer system.
SUMMARY OF THE INVENTION
As explained in detail below, the present invention is directed to a method and system for operating a computer having multiple processing units. According to the invention, each processing unit is loaded with at least one data value of a set of data values and commanded to generate a mask bit for each loaded data value. The processing units communicate the generated mask bits to each other such that each processing unit receives the mask bits generated by the other processing units. Finally, the processing units process each loaded data value as a function of the loaded data value's respective generated mask bit and at least one of the mask bits received from another processing unit.
According to one feature of the invention, the processing units calculate each of a plurality of iota values by summing a series of the generated mask bits. In one embodiment, each iota value is calculated by summing all mask bits having an ordinal position lower than an ordinal position of the calculated iota value. In another embodiment, each iota value is calculated by summing the mask bits having an ordinal position lower than the ordinal position of the calculated iota value and the mask bit having an ordinal position equal to the ordinal position of the calculated iota value.
According to another aspect, the present invention is a computer for processing a set of data values. The computer includes a plurality of processing units that are loaded with at least one data value of the set of data values. Each processing unit generates a mask bit for each loaded data value. A plurality of data pipes communicatively couple each processing unit such that each processing unit receives the mask bits generated by the other processing units. The processing units calculate a plurality of iota values by summing a series of the generated mask bits.
According to yet another aspect, the invention is a method of compiling a software module for processing a set of data values using a computer having a plurality of communicatively coupled processing units. The method includes generating a first machine language instruction to command the processing units to generate a mask bit for each data value of the set of data values. A second machine language instruction is generated to command the processing units to calculate an iota value for each of the generated mask bits by summing a series of the generated mask bits. In one embodiment, the first machine language instruction commands each processing unit communicate the set of mask bits to the other processing units. In another embodiment, the second machine language instruction commands each processing unit to communicate the set of mask bits. The method further includes generating a third machine language instruction to command the processing units to process the set of data values as a function of the calculated iota values.
According to another aspect, the present invention is a compiler for a computer having a plurality of communicatively coupled processing units. The compiler generates a single machine language instruction to command the computer to calculate a vector of iota values from a mask vector having a plurality of mask bits, wherein each iota value of the iota vector is calculated by summing a series of the mask bits.
These and other features and advantages of the invention will become apparent from the following description of the preferred embodiments of the invention.
REFERENCES:
patent: 3699532 (1972-10-01), Schaffer et al.
patent: 4356547 (1982-10-01), Barcaroli et al.
patent: 4858115 (1989-08-01), Rusterholz et al.
patent: 4881168 (1989-11-01), Inagami et al.
patent: 5349682 (1994-09-01), Rosenberry
patent: 5555428 (1996-09-01), Radigan et al.
patent: 5678058 (1997-10-01), Sato
patent: 5903769 (1999-05-01), Arya
patent: 5946256 (1999-08-01), Okimura
Chan Eddie
Nguyen Dzung C.
Schwegman Lundberg Woessner & Kluth P.A.
Silicon Graphics Inc.
LandOfFree
Method and apparatus for processing a set of data values... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for processing a set of data values..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for processing a set of data values... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2592985