Programmable accelerator for a programmable processor system

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C708S708000, C708S709000

Reexamination Certificate

active

06397240

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to programmable processor systems, such as digital signal processor systems, and more particularly, to methods and apparatus for achieving high processing rates, required for certain algorithms currently achieved only by dedicated hardware.
BACKGROUND OF THE INVENTION
Currently available digital signal processors are highly programmable, but they do not provide sufficient performance for many applications, since the digital signal processor is optimized for a data width of 16 bits or higher precision. Thus, to achieve the higher processing rates required for certain algorithms, which require more than an order of magnitude beyond the capabilities of commercially available digital signal processors, a number of digital signal processor systems, such as receivers in a wireless local area network (LAN) or a wideband CDMA network, have implemented such algorithms in dedicated application specific logic or in dedicated coprocessors. Specifically, algorithms requiring low-precision and relatively high data rates, such as certain types of finite impulse response (FIR), correlation and Viterbi computations, have been implemented in such application specific integrated circuits (ASICs) or coprocessors.
For example, in a typical Wireless LAN channel matched filter performing FIR computations, approximately 500 million multiply-add calculations (MACs) per second are required. Meanwhile, the required input and output precision for such FIR computations is only five bits and nine bits, respectively. Likewise, in a wireless LAN correlator, the incoming bit stream must be correlated with the original Barker code sequence, in a well-known manner. Such correlation computations require about 900 million multiply-add calculations (MACs) per second. Since the Barker code is only a one-bit sequence (with each value being either +1 or −1), the multipliers implement relatively simple operations. Finally, Viterbi decoders in wideband CDMA or IS-95 receivers have increasingly high bit rates and an increased constraint length of the convolutional code. Meanwhile, a branch metric in such a Viterbi decoder can be represented by less than eight bits (even for soft decision decoding) and no more than 32 branch metrics need to be stored for a complete update of the required 256 states.
While application specific integrated circuit (ASIC) and coprocessor implementations efficiently (with low power dissipation) perform such operations at the required data rates, they typically perform only a single function. In addition, since the design and verification of such application specific integrated circuits is often an expensive and time-consuming process, any modifications to an application specific integrated circuit implementation will require a significant amount of time and expense.
As apparent from the above-described deficiencies with current techniques for achieving processing rates required for certain digital signal processor algorithms, a need exists for a programmable and low power accelerator that achieves required processing rates for a number of different algorithms.
SUMMARY OF THE INVENTION
Generally, a programmable multi-mode accelerator is disclosed for use with a digital signal processor, microcontroller or microprocessor. The term “programmable processor” is used herein to collectively refer to a digital signal processor, a microcontroller or microprocessor. The programmable multi-mode accelerator allows a programmable processor to execute specific algorithms that require low-precision operations at an extremely high rate, such as certain types of finite impulse response, correlation and Viterbi computations. The disclosed programmable multi-mode accelerator replaces the ASIC implementations that have typically been used in digital signal processor systems and allows for a more programmable and more cost-effective solution. The accelerator extends the digital signal processor's performance into the required range for low-precision computations.
In one implementation, the accelerator begins executing its program after the main decode and dispatch unit of the programmable processor has issued a special start instruction. In such an implementation, the accelerator is coupled with the main data path of a programmable processor. The accelerator optionally has direct access to the register files of the programmable processor. In an illustrative implementation, the accelerator data path obtains its input values (source operands) directly from a set of registers in the programmable processor and writes results back into a second set of registers.
According to an aspect of the invention, the accelerator allows a plurality of algorithms, such as certain types of finite impulse response, correlation and Viterbi computations, to utilize the same adder cells thereby saving silicon area. In particular, the present invention allows low-precision algorithms requiring primarily addition or multiply-add computations to be implemented using a programmable accelerator. Thus, although an illustrative finite impulse response computation requires sixteen eight bit by eight bit multipliers and an adder tree to add the 16 products, and an illustrative Viterbi computation requires eight 16-bit additions and compare-select operations, the present invention allows these computations to be performed using the same adder cells. Thus, in accordance with the present invention, the accelerator includes a multi-mode adder that can be programmatically reconfigured to perform the various operations discussed above.
The multi-mode adder is controlled by the instructions of the accelerator. In a first mode, referred to as the “single-add mode,” the adder operates as a 17-input 16-bit adder. In the single-add mode, the adder has 17 16-bit inputs that are all summed to form one 16-bit output. One input is a feedback path and the other 16 inputs come from a multiplexer and a multiplier bank. The single-add mode can be utilized to perform finite impulse response and correlation computations.
In the single-add mode, the illustrative accelerator can implement FIR filters with a delay line having delays of z
−1
or z
−2
and with up to 16 taps. In this implementation of the FIR filter, the throughput is one output sample per cycle. In addition, the accelerator can implement a finite impulse response filter with a z
−1
delay line and with between 17 and 32 taps. In this implementation of the FIR filter, the throughput is one output for each two cycles.
In the single-add mode, the accelerator initially advances the registers in the delay chain by one, reads a new value from the main register file, and writes the value into the first register of the delay chain. In the next cycle, the eight accelerator registers are read and are applied to the inputs of the multipliers in the multiplier bank. In addition, the delay chain values are applied to the inputs of the multipliers in the multiplier bank, and the values are multiplied. Thereafter, the outputs of the multipliers in the multiplier bank are summed by the adder, with or without the feedback input. Finally, the output of the adder is written back to the main register file.
In a second mode, referred to as the “four state add-compare-select mode” (or “ACS mode”), the feedback path is inactive. The other
16
inputs are divided into 8 groups of two inputs each. The two inputs of each group are summed to form eight intermediate 16-bit outputs. The eight intermediate 16-bit outputs are paired and a maximum or minimum from each pair is selected, based on the current operating mode, to produce four values. These four values are concatenated into two 32-bit values and sent back to the register file where results are stored. The ACS mode can be utilized to perform Viterbi computations.
In the ACS mode, the accelerator initially reads two values from the accelerator registers and sign-extends them to an appropriate length. In addition, two of the registers from the main register file where inputs are stored are

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Programmable accelerator for a programmable processor system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Programmable accelerator for a programmable processor system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Programmable accelerator for a programmable processor system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2908952

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.