Parallel computing units having special registers storing...

Electrical computers and digital processing systems: processing – Processing architecture – Long instruction word

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06401190

ABSTRACT:

TECHNICAL FIELD
The present invention relates to a processor suitable for multimedia processing such as digital animation and three-dimensional graphics and, more particularly, to a processing for implementing processing of a high degree of parallelism with a small code size.
BACKGROUND ART
Recently, mainly personal computers and workstations have been increasingly made multimedia compatible. Capabilities mainly required by multimedia include motion picture compression and expansion, voice compression and expansion, three-dimensional graphics processing, and a variety of recognition processing. For voice processing and the like, a DSP (Digital Signal Processor) having performance of several tens of MOPS is conventionally used. However, handling of motion pictures and graphics requires a processor of fairly high performance. For example, motion picture expansion requires performance of about 2 GOPS and its compression requires performance of about 50 GOPS. To satisfy these performance requirements, performance of computing units must be enhanced. Computing unit performance can be enhanced in two approaches; increase of operation frequency and parallel computing.
The former can be achieved comparatively simply but increases the difficulty of packaging design, resulting in increased cost. To implement the performance at a reasonable cost, the latter approach may also be necessary. However, the parallel computing approach presents problems of whether applications are ready for parallelism and that control for effective use of a plurality of computing units is complicated. As for applications, a fairly high parallelism is found as long as multimedia is concerned. For example, 8 computational operations is concurrently executable in motion picture compression.
Approaches for good use of a plurality of computing units include superscalar architecture and VLIW (Very Long Instruction Word). The former is mainly used by general-purpose processors and the scheduling for concurrently executing a plurality of computational operations is performed by these processors. This approach is advantageous in exchangeability of objects with an existing single-processing processor, but at the cost of its extremely complicated hardware because the scheduling is dynamically performed by the processors. On the other hand, VLIW has a problem of compatibility with existing processors but is advantageous in its simplified hardware because no instruction decoder is required.
One of the points of the VLIW hardware simplification is its instruction format. This instruction format is composed of fields for directly controlling computing units, thereby extremely simplifying the control by hardware. A processor having such an instruction format is disclosed in Japanese Non-examined Patent Publication No. Sho 63-98733 “COMPUTER CIRCUIT CONTROL METHOD” for example. In this citation, an operation field indicating that a micro instruction for computation is an instruction for computation and a plurality of control bits for controlling a computing circuit are provided, directly controlling each part of the computing circuit by each of these control bits. Thus, VLIW can implement parallel processing by comparatively simple hardware.
As described, superscalar architecture and VLIW provide effective means for enhancing processing parallelism to draw out performance. In order to fully draw out parallelism, the help of a compiler is indispensable. To be specific, a technique such as loop expansion is known. In this technique, a loop body in a program is duplicated (expanded) a plurality of times and the codes in the expanded loop are scheduled. Namely, increasing the number of instructions to be executed between loop return branches increases the possibility of executing a plurality of instructions concurrently.
The above-mentioned technique duplicates a loop, thereby imposing a problem of increasing code size. A larger code size requires a larger memory capacity in which a program is stored, resulting in increased system cost. In the processors sharing a cache memory, increased code size lowers hit rate, thereby lowering system performance.
Increasing processor parallelism increases the number of computing units. This results in increased circuit scale, thereby increasing the number of development steps. In the computer market mainly dominated by personal computers, well-timed introduction of new products on the market is important in terms of business. To satisfy this requirement, it is important to reduce the number of development steps.
It is therefore an object of the present invention to provide a processor having an architecture for minimizing the code size while enhancing the processing parallelism for enhanced performance.
Another object of the present invention is to provide a processor capable of executing many computational operations by a small number of instruction codes.
Still another object of the present invention is to provide a VLIW processor based on static scheduling.
Yet another object of the present invention is to provide a VLIW processor compatible with various applications and enhanced in the operating ratios of the computing units.
A further object of the present invention is to provide a processor suitable for multimedia processing effective for reducing the instruction code amount of a parallel processor that repeatedly executes computational operations of a same type as with multimedia processing.
A still further object of the present invention is to provide a superscalar processor effective for reducing code size.
A yet further object of the present invention is to provide a processor architecture capable of enhancing processing parallelism while minimizing the number of development steps.
DISCLOSURE OF INVENTION
In order to solve the above-mentioned first problem, the present invention, as long as multimedia processing is concerned, pays attention to that a plurality of computations of a same type are often executed concurrently and prepares mode information for controlling a plurality of computing devices with a single instruction in the instruction format.
For example, in order to execute a plurality of computations with a single instruction by a plurality of computing devices, in a VLIW processor in which one instruction is constituted by a plurality of fields for controlling the computing devices, mode information for controlling the plurality of computing devices is provided in one field. Further, an instruction expansion circuit for generating a plurality of fields from one field in one instruction is provided and the above-mentioned plurality of computing devices are constituted by arranging a plurality of computing devices having a same function.
In a superscalar processor, mode information for simultaneously controlling a plurality of computing devices is provided in one instruction. In addition, an instruction expansion circuit for generating a plurality of instructions from one instruction is provided and a plurality of computing devices having a same function are arranged such that the plurality of generated instructions can be executed concurrently.
In a processor having three or more computing devices, specification information for specifying the computing devices to be executed concurrently is provided and the above-mentioned instruction expansion circuit is provided with a function for generating the required number of instruction fields for the VLIW processor and generating an instruction for the superscalar processor according to the above-mentioned specification information.
In order to solve the above-mentioned second problem, the present invention provides a plurality of computing units constituted by a computing device for concurrently executing a plurality of computations of a same function, an integer computing device for mainly reading an operand to be supplied to this computing device from a memory, and a register file for storing an operand to be used by the above-mentioned two types of computing devices.
Namely, the present invention is a processor having a memo

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Parallel computing units having special registers storing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Parallel computing units having special registers storing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallel computing units having special registers storing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2892090

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.