Low cost multiplier block with chain capability

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C708S620000, C708S625000

Reexamination Certificate

active

06484194

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
Background: Microprocessor Operation
Microprocessors are often required to manipulate binary data having wide ranges of bit lengths, for example data that ranges from a single logic bit to high precision arithmetic operations involving data that may be more than 128 bits in length.
Hardware arithmetic logic units (ALUs) within microprocessors are generally constructed and arranged to handle fixed operand lengths. As a result, high precision arithmetic operations require multiple program steps and multiple microprocessor cycles. These data processing conditions lead to programs that are inefficient in terms of execution time because the microprocessor hardware and the supporting program instruction set are not optimized for operating on data having a wide range of operand lengths.
This inefficiency results in large part from repeated stores and loads to memory as well as software loop control overhead (compares, branches, etc.) For complex operations such as multiplication and operations involving extended-precision algorithms the overhead is even more pronounced. In addition, the sign (negative or positive) and zero status of an arithmetic result must be handled separately for multi-word calculations, requiring even more processor time to complete the operation.
Background: Digital Signal Processors
Digital Signal Processors (DSPs) are high-speed microprocessors optimized to carry out large numbers of arithmetic operations in a short period of time. As the name implies, DSPs were developed to carry out real-time processing (e.g., filtering, compression, encryption) algorithms on digital signals. DSPs incorporate performance-optimized arithmetic structures not found in conventional microprocessors. Among the devices incorporated into DSPs to improve number-crunching performance, two of the most important are the hardware multiplier block and the barrel shifter.
Background: Multiplier Blocks
As with most structures in a DSP, hardware multiplier blocks are high-performance, speed-optimized structures. Unfortunately, DSP hardware multiplier blocks require large amounts of chip surface area for specialized signal-processing circuitry. In addition, an increase in the size of the multiplier block generally requires a concomitant increase in the size of a CPU's data buses and arithmetic logic units (ALUs). For example, the multiplication of two 16-bit numbers gives a 32-bit result. A multiplier block able to handle this 16-bit multiplication would generally require 32-bit data buses, arithmetic logic units (ALUs), and accumulators to accommodate the result. This adds considerably to the complexity, and therefore the cost, of a microprocessor. It is also desirable, especially in signal processing applications, to have the capability to multiply numbers of more than one word in length. Unfortunately, the complexity and size of a multiplier block quickly grows unmanageable as the size of its operands increases.
Conventional microprocessors often do not incorporate a hardware multiplication block or barrel shifter, as the size and cost is prohibitive, especially in the cost-competitive consumer market. In a conventional lower-end processor, numbers are multiplied by repetitive additions in a multi-bit adder, a relatively slow process. A typical multiplication carried out in software requires a long series of shifts and adds, requiring a great deal of processor time. From a speed standpoint, it would be highly desirable to have the performance and functionality of a hardware multiplier block built into a low-cost microprocessor, but the complexity and space requirements of the related hardware have traditionally made such a design cost-prohibitive.
Background: Barrel Shifters
A barrel shifter is another speed-optimized DSP structure extremely useful for large calculations. Barrel shifters are designed to shift a number several bit positions in a single operation. Although a barrel shifter is similar in function and structure to a multiplier block, it is conventionally designed in as a completely separate structure on the chip. In binary arithmetic, a shift left of one bit position equates to a multiply by two. A left shift of N bits is equivalent to a multiply by 2
N
.
Conventional low-end processors do not incorporate barrel shifters, but carry out barrel shifting through a series of single position shifts, each shift comprising a single software operation. Barrel shifting is a highly desirable function in a processor, but the specialized circuitry required to carry out this function takes up too much space on a chip to be included in low-cost processors. The barrel-shifting function could be carried out by a multiplier block if the numbers were encoded properly, but if the encoding must be implemented in software, the overhead significantly reduces any performance gains from a hardware barrel shift.
Background: Chain Processing
It is known that prior microprocessors have included the capability of operating on chains, for example by repeating a given instruction a prescribed number of times. It is known that a repeat add with carry will execute a chain operation where the data memory address of the operands and the result are automatically incremented after each operation. It is also known that others have used fixed hardware multipliers to do extended precision multiplies by using a multiply-by-parts algorithm, a complex and relatively inefficient solution.
As mentioned, microprocessors typically must manipulate operands of differing, sometimes widely differing, lengths. Operand lengths can vary from a single logic bit to 512 bits or more. Arithmetic logic units (ALUs), on the other hand, have a fixed width. Where high-precision arithmetic is necessary, requiring operands longer than the ALU, the processor must execute the operation in multiple steps. The programs become inefficient in terms of execution time and programming code efficiency because the basic hardware and supporting instruction set are not optimized for operating on extended-precision data represented by a sequential chain of data words.
Where a number that is one word in length must be multiplied by a number greater than one word in length (chain multiplication), the process can be extremely cumbersome even in designs incorporating a multiplier block. Such a multiplication is carried out by a series of one-word sub-multiplications beginning with the least significant word. For each sub-multiplication, the code must instruct the processor to (1) carry out the single-word multiplication (either in hardware or software), (2) store the lower word of the result to memory, (3) move the higher word of the result to the correct operand register for the next sub-multiplication, (4) check to see if the operand chain is complete, and if not, (5) loop back to start the process over again. This process carries with it a heavy amount of instruction-decode and data-shifting overhead, slowing the multi-word multiplication process down considerably.
The need remains in the art for an enhanced multiplier with specialized architecture to address the problem of operating on long, multiple word length data in an efficient, consistent and unified manner.
Multiplier Block With Chain Capability
This application discloses a multiplier block making use of a “chaining” device and integral barrel shift circuitry to increase arithmetic throughput while avoiding many of the liabilities of DSP logic. In order to avoid the inclusion of a double-width (32-bit) ALU and accumulator and yet still accommodate the double-precision (32-bit) result of the multiplier, the 32-bit product of the static multiplier block is split into two parts. The lower 16 bits pass directly to the ALU for immediate use (transfer, accumulate, etc.). The upper 16 bits are latched into a 16-bit register (known as Product High) for later use. The Product High register can be ported directly back to the multiplier block for continuous, multi-word (chain) multiplication. This optimization is totally consistent with the single data memory,

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Low cost multiplier block with chain capability does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Low cost multiplier block with chain capability, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Low cost multiplier block with chain capability will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2934637

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.