Single precision array processor

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Single precision array processor Single precision array processor

: 1998-06-19
: 2002-08-06
: Mai, Tan V. (Department: 2124)
: Electrical computers: arithmetic processing and calculating
: Electrical digital calculating computer
: Particular function performed

: C708S523000
: Reexamination Certificate
: active
: 06430589
: ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to integrated circuits and in particular, to integrated circuits with circuitry for performing single precision floating point arithmetic.
Calculates involving single precision floating point arithmetic arise in many different applications. Often, these are computation intensive applications that benefit greatly from high performance calculations. The applications include, for example, video frame generation and digital signal processing (DSP) tasks.
Many different programs are used in frame generation. For example, see references [19]-[23]. These programs are both complex and need high performance. They are created with high level procedural and object-oriented computer programming languages such as C, C++, and FORTRAN. Only the most performance critical portions of these programs are usually directly written in assembly/machine language targeting the underlying rendering engine hardware because of the prohibitive expense and difficulty of programming in assembly/machine language. Floating point arithmetic is popular in these programs because of its wide dynamic range and programming ease.
The need for performance improvements is large. For example, optimal video editing requires a frame to be generated every second. Real-time virtual reality needs up to 30 frames generated per second. In order to satisfy these needs, and other similar needs in other applications, current technology must improve tremendously. For example, performance improvements needed to satisfy these two industrial applications are speedups of 108,000× for video editing (=30 hrs./frame×3600 seconds/hr) and 3,240,000× for virtual reality (=30*Video Editing).
A similar situation exists in high performance Digital Signal Processing. A typical DSP application includes processing images, often collected from 2-D and 3-D sensor arrays over time to construct images of the interior of materials including the human body and machine tools. These multidimensional signal processing applications construct images from banks of ultra-sound or magnetic imaging sensors. This has similar performance requirements to frame generation. These applications have the goal of resolving features in a reconstruction/simulation of a 3-D or 4-D environment. (Note: 4-D here means a 3-D domain observed/simulated over time.) Feature resolution is a function of input sensor resolution, depth of FFT analysis which can be computationally afforded within a given period of time, control of round-off errors and the accumulation of those rounding errors through the processing of the data frames.
Fine feature resolution in minimum time leads to performing millions and often billions of arithmetic operations per generated pixel or output data point. The use of floating point arithmetic to provide dynamic range control and flexible rounding error control is quite common. Algorithmic flexibility is a priority, due to the continuing software evolution and the availability of many different applications. These differing applications often require very different software.
The application software development requirements are very consistent. In particular, most applications need numerous programs, mostly written in the procedural computer programming languages, C, C++ and FORTRAN (see references [11]-[18]). Use of machine level programming is restricted to the most performance critical portions of the programs.
Typical algorithms for performing the applications discussed above, have many common features such as a need for large amounts of memory per processing element, often in the range of 100 MB; a need for very large numbers of arithmetic calculations per output value (pixel, data point, etc.); a need for very large numbers of calculations based upon most if not all input values (pixel, data point, etc.); and, relatively little required communication overhead compared to computational capacity.
Many of these algorithms use calculations that require, for example, several relatively short vectors and calculations involving complex numbers. For example, some algorithms include calculating complex valued functions such as X=(az+b)/(cz+d), wherein a, b, c, d, and z are all complex floating point numbers. The algorithms define A0, b0, c0, d0, z0 and X0 as the real components and correspondingly, a1, b1, c1, d1, z1 and X1 as the imaginary components. The calculation prior to entry into a floating point division circuit proceeds in two multiply-accumulate passes. In the first pass, the following are calculated:
A
0
=a
0
*z
0
−a
1
*z
1
+b
0
A
1
=a
0
*z
1
+a
1
*z
0
+b
1
B
0
=c
0
*z
0
−c
1
*z
1
+d
0
B
1
=c
0
*z
1
+c
1
*z
0
+d
1
In the second pass, the results of B0 and B1 are fed back into multiplier-accumulators (discussed later) as shared operands to generate:
C
0
=A
0
*B
0
−A
1
*B
1
C
1
=A
1
*B
0
+A
0
*B
1
D=B
0
*B
0
+B
1
*B
1
Finally, the division operations are performed:
X
0
=C
0
/D
X
1
=C
1
/D
The circuitry disclosed herein optimizes the performance of calculation of the A, B, C and D formulae above.
PRIOR ART SUMMARY
Some of the major advances relevant to this invention relate to the development of high speed micro-processors and DSP engines. High speed micro-processors and DSP engines possess great intrinsic algorithmic flexibility and are therefore used in high performance dedicated frame rendering configurations such as the SUN network that generated Toy Story. See reference [1].
The advent of the Intel Pentium™ processors brought the incorporation of many of the performance tricks used in the RISC (Reduced Instruction Set Computing) community. “Appendix D: An Alternative to RISC: The INTEL 80×86” in reference [30] and “Appendix: A Superscalar 386” in reference [31] provide good references on this. “Appendix C: Survey of RISC Architectures” in reference [30] provides a good overview. However, commercial micro-processor and DSP systems are severely limited by their massive overhead circuitry. In modern super-scalar computers, this overhead circuitry may actually be larger than the arithmetic units. See references [30] and [31] for a discussion of architectural performance/cost tradeoffs.
High performance memory is necessary but not sufficient to guarantee fast frame generation because it does not generate the data—it simply stores it. It should be noted that there have been several special purpose components proposed which incorporate data processing elements tightly coupled on one integrated circuit with high performance memory, often DRAM. However these efforts have all suffered limitations. The circuits discussed in [32] use fixed point arithmetic engines of very limited precision. The circuits discussed in [32] are performance constrained in floating point execution, and in the handling of programs larger than a single processor's local memory.
Currently available special purpose components are not optimized to perform several categories of algorithms. These components include
1. Image compression/decompression processors.
a. These circuits, while important, are very specialized and do not provide a general purpose solution to a variety of algorithms.
b. For example, such engines have tended to be very difficult to efficiently program in higher level procedural languages such as C, C++ and FORTRAN.
c. The requirement of programming them in assembly language implies that such units will not address the general purpose needs for multi-dimensional imaging and graphical frame generation without a large expenditure on software development. See references [24] and [25].
2. Processors optimized for graphics algorithms such as fractals, Z-buffers, Gouraud shading, etc.
a. These circuits do not permit optimizations for the wide cross-section of approach

Affiliated with

Jennings, III Earle W.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Hynix / Semiconductor Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Mai Tan V.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Single precision array processor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Single precision array processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Single precision array processor will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2905245

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure