Method for performing a fast transform

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C708S404000

Reexamination Certificate

active

06591284

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to digital signal processing. More particularly, this invention relates to memory, CPU, and power efficient performing of fast transforms.
2. Description of the Related Art
In literature many fast transforms (fast Fourier transform, fast cosine transformation, etc.) are known. As an example the fast Fourier transform is discussed below but the invention is not limited thereto.
Traditionally, as shown in
FIG. 7
, FFT stages are calculated sequentially on a programmable processor or in a time multiplexed hardware solution, usually followed by a sequential iteration over each block and the butterflies involved. Three nested loops are considered: stage—block—butterfly. The access graph shown in
FIG. 8
shows which addresses are used for calculation at each moment. The end points of each line represent which elements are used; the length of a line determines the current stage.
FIG. 7
is the traditional representation of an FFT.
FIG. 8
shows the access sequence in time. E.g. the content of addresses
0
and
16
are used first, followed by addresses
1
and
17
, etc. In the second stage, address
0
and
8
are used,
1
and
9
, etc. The presented scheduling of the butterflies is not optimal when power consumption of the hardware device on which the transform is executed is important.
SUMMARY OF THE INVENTION
The invention presents memory access orderings, also denoted schedules, for fast transforms which are optimal with respect to power consumption and bus loading of the hardware device on which said transforms are executed.
The invention presents a method for minimizing memory space and power consumption in a signal processor when transforming a first m-dimensional indexed array with N elements into a second m-dimensional indexed array with M elements, said second array being a transform of said first array, said method comprising the steps of executing a plurality of butterfly codes, also denoted calculation method steps, each butterfly code being characterized by the elements of the indexed arrays accessed by said butterfly code, said method is characterized in that at least part of said butterfly codes are assigned to be part of at least one group of butterfly codes such that butterfly codes within one group are executed sequentially, meaning substantially close after each other in time, and said grouping of butterfly codes enables the use of storage spaces being substantially smaller than then storage space needed to store an array with N elements. The storage spaces used are either registers or a distributed memory configuration or a combination thereof. Note that M is often equal to N.
In a first aspect of the invention data locality improvement schedules are presented. In a first embodiment a full data locality improvement schedule is disclosed while in a second embodiment a partial data locality improvement schedule is shown.
The full data locality improvement schedule can be described as a method for transforming a first m-dimensional array into a second m-dimensional array, wherein said second array being a transform of said first array. Said transform can be a Fourier transform, a cosine transform, a Karhunen-Loève transform, a Hartly transform but is not limited thereto. Said method comprising the steps of executing a plurality of codes, also denoted butterfly codes, each code being characterized by its array variables or array elements it accesses. One can state that said arrays are indexed, meaning that each element of said arrays can be referred to by its index number. Note that with code or butterfly code is meant a calculation method which reads elements of an array and produces new elements. More in particular is meant a method reading elements and producing elements via operations such as complex additions, subtractions, multiplications and/or divisions. The multiplication factor is a parameter, which can vary from execution to execution of said codes while the operations are the same within said codes. Said execution is characterized in that said codes are scheduled by grouping in pairs or sets of codes with a maximal distance between the array variables accessed by said codes. Said pairs are placed in a predetermined ordering. In a particular example said predetermined ordening is address bit-reversed. Note that with distance is meant the index difference of the elements of the array under consideration. Distance can also be denoted window. With maximal is meant having the largest index difference when considering all codes needed to describe the transformation under consideration. For each pair of said ordened codes codes which access at least one array variable being accessed also by one of the code of said pair are determined. Note that not necessarily all such codes are selected. Thus part of the codes which have a common access with one of the code of said pair are selected. The selected codes can be denoted as codes being assigned to such a pair. The selected codes are ordened in a binary tree with as top node one of said pair of codes according to their related maximal distance wherein higher distance of such code implies closer placement to the top node of said tree. For each pair of codes a binary tree can be constructed. The execution order of said ordered codes is determined by traversing said binary tree in a depth-first manner. Note that this is done for each pair. The improved data locality obtained by the above described scheduling is exploited because during said execution of said scheduled codes data is shared between at least part of said pairs of codes subsequently scheduled after one another. This data sharing is done via small memories, often denoted registers, register files, being capable of storing a few elements, possible a single element. Said small memories, also denoted foreground memory, are local to the datapath of the hardware device executing said application. Note that with subsequently scheduled is meant scheduled soon after each other in time. Some intermediate operations are possible depending on the amount of elements that said data sharing memories can store. During said execution of said scheduled butterfly codes at least part of the accesses of elements accessed by said scheduled butterfly codes are accesses to a storage space being capable of storing a few elements.
A first characteristic of the best mode realisation of the full data locality improvement ordering is an address bit-reversed selecting of the butterflies with largest index difference, also denoted top butterflies, is done. Suppose that the butterfly selected is characterized by the address of the element it accesses with the lowest address number. Suppose one write down the sequence 0,1,2, . . . in binary format with log
2
(N)−1 number of bits with N the size of the transform, thus 0000, 0001, 0010, . . . for a 32 Fourier transform as an example. This binary sequence is now transformed into another binary sequence by reversing the bit ordering, thus one obtains 0000, 1000, 0100, . . . or in normal format 0, 8, 4, . . . This sequence gives the lowest address number of the butterfly being selected as can be observed in
FIG. 4. A
second characteristic of the best mode realisation of the full data locality improvement ordering is the tree-depth, meaning the number of levels in said binary trees, each level being characterized by the index difference which is the same for the butterflies represented by the nodes of a single level. The tree-depth is given by the log
2
(N)−[log
2
(N/2−x)], wherein x means the lowest address number of the butterfly defining the top node of the tree, hence the wording top butterflies, and [ ] means rounding the number in between brackets up to the following larger integer. An example is given in the table below showing in the actual location of the top butterflies, their characterizing address (lowest address) and the depth of their related tree.
0
0
1
1
8
2
4
4
1
5
12
3
12
2
1
13
10
2
16
6
1
17
14
4
32
1
1

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for performing a fast transform does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for performing a fast transform, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for performing a fast transform will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3087179

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.