Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed
Reexamination Certificate
2000-06-15
2003-07-22
Malzahn, David H. (Department: 2124)
Electrical computers: arithmetic processing and calculating
Electrical digital calculating computer
Particular function performed
Reexamination Certificate
active
06598061
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system and method for performing modular multiplication, and in particular the modular multiplication a*b*2
−N
modulo n, where a, b and n are N-bit integers with n odd. It will be appreciated that the integers a, b and n may not necessarily require all N-bits to specify the integer, but it is clear that any such numbers can always be specified by N-bits by adding an appropriate number of zeros as the most significant bits of the N-bit number.
2. Description of the Prior Art
There are a number of situations where it is desirable to perform the modular multiplication a*b*2
−N
modulo n, this particular type of modular multiplication often being referred to as Montgomery multiplication.
One particular implementation which makes use of Montgomery multiplication is RSA encryption/decryption as used in smart cards or other secure devices. RSA requires the rapid modular multiplication of large integers, for example 512 or 1024 bits in length, in order to carry out modular exponentiations. The Montgomery multiplication technique provides an efficient way of producing the modular result by applying corrections to partial products as they are generated and accumulated.
In typical implementations of Montgomery multiplication, the multiplication has to be handled in word lengths of w bits, where w is typically 16 or 32. The Montgomery algorithm requires a (N+w+1) bit storage register “c” for holding and accumulating partial products as they are generated, and a temporary w-bit number m, which is a Montgomery correction factor.
In the following discussion of the Montgomery algorithm, the notation b[j] refers to the j-th w-bit word of b, i.e. b[
0
] is the least significant w-bit word of b, and b[s−1] is the most significant w-bit word of b, where s=N/w, i.e. the number of words in an N-bit number.
The Montgomery correction factor m makes use of a constant n
0
p that is derived from the least significant w-bit word of the modulus n[
0
]. This constant n
0
p obeys the following relation:
n
[
0
]*
n
0
p
=−1 mod 2
w
Given the above, the standard Montgomery algorithm can be represented by the following pseudo code:
c
:=0
for j=0 to s−1 do
c:=c+a*b[j] // calculate and accumulate the partial product
m:=(c[
0
]*n
0
p) mod 2
w
// calculate the montgomery correction factor
c:=c+n*m // apply the montgomery reduction
c:=c>>w // drop the bottom word of c which is always zero
endfor
result:=c // the final (N+1) bit result is in c
In practice, the a*b[j] and n*m multiplications in the above pseudo code are broken down into fixed size multiply-accumulate (MAC) operations, for example 32×16 bit MAC operations if w=16. This size of MAC is a good balance between maximising data throughput on the one hand and minimising die area and easing timing requirements on the other. However, it will be appreciated that for the Montgomery multiplication to be performed in an efficient manner, the multiplication block must still have access to sufficient fast storage to store the input N-bit integers a, b and n, and the partial products generated, for which N+w+1 bits are required. In a hardware implementation of the multiplication block, the fast storage would typically take the form of hardware registers, whilst in a software implementation the fast storage might take the form of a cache or SRAM.
An example of such a prior art approach is the stand alone RSA coprocessor from SIDSA referred to as the sRSAC2048A, which uses a large bank of 256 32-bit words of onboard memory to hold a maximum of four 2048-bit values.
It will be appreciated that for large N-bit integers, there is hence a requirement for a significant amount of fast storage, and the current trend is for these N-bit numbers to increase in size, thereby further increasing the storage requirements of a system for performing Montgomery multiplication.
SUMMARY OF THE INVENTION
Viewed from a first aspect the present invention provides a system for performing a modular multiplication a*b*2
−N
modulo n, where a, b and n are N-bit integers, comprising: a multiplier for multiplying a Y-bit number by a Z-bit number; partitioning logic for partitioning the integer a into a plurality of first sections, each first section being of a size which is a multiple of Y, and for partitioning the integer b into a plurality of second sections, each second section being of a size which is a multiple of Z; a multiplication unit for applying operations to control the multiplier to perform a sequence of multiplications to multiply one of said first sections by one of said second sections in order to generate a number of output operands for use in subsequent operations performed by the multiplication unit; a controller for sequentially inputting one of said first sections and one of said second sections into the multiplication unit along with predetermined ones of said output operands from preceding operations performed by the multiplication unit, until each first section has been multiplied by each second section.
In accordance with the present invention, a multiplier is provided for multiplying a Y-bit number by a Z-bit number, and hence for example may be arranged to perform a 32-bit by 16-bit multiplication. Then, in accordance with the present invention, the N-bit integers a and b are partitioned into a plurality of first and second sections, respectively, where each first section is of a size which is a multiple of Y, and each second section is of a size which is a multiple of Z. In preferred embodiments, the partitioning logic used to perform this partitioning is implemented in software.
Further, a multiplication unit is provided which is able to control the multiplier to perform a sequence of multiplications in order to multiply one of the first sections by one of the second sections, with a number of output operands then being generated for use in subsequent operations performed by the multiplication unit. A controller is provided to sequentially input one of the first sections and one of the second sections into the multiplication unit along with predetermined ones of the output operands from preceding operations performed by the multiplication unit, so as to enable each first section to be multiplied by each second section.
By this approach, a multiplication unit can be provided which is of a fixed size, irrespective of the size of the input integers, a b and n. The integers can then be partitioned into a plurality of sections, each section being of a size which can be handled by the multiplication unit, with a controller then being provided to manage the sequence in which the sections are operated upon by the multiplication unit. This approach alleviates the requirements for increasingly larger fast storage, the size of the fast storage being dependent not on the ultimate size of the N-bit integers, but rather on the predetermined size of the sections into which those integers are partitioned.
For example, in one embodiment, the sections into which the N-bit integers are partitioned may be 256 bits in length and accordingly the fast storage, for example hardware registers if the multiplication unit is implemented in hardware, need only to be able to store 256 bit sections of the integers a, b and n, rather than having to store the entire N-bit integers. The entire Montgomery multiplication on the N-bit integers is then performed by calling the multiplication unit several times in a chained fashion under the control of the controller. Hence, if the multiplicand a and multiplier b are 768-bit numbers, and the multiplication unit is arranged to handle 256-bit sections of these numbers, then the multiplication unit can be used to break down the multiplication into a 3×3 array of blocks which are 256×256 bits in size. Similarly, if the multiplican
Seal David James
Symes Dominic Hugo
Arm Limited
Malzahn David H.
Nixon & Vanderhye P.C.
LandOfFree
System and method for performing modular multiplication does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for performing modular multiplication, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for performing modular multiplication will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3107555