Electrical computers and digital processing systems: processing – Processing architecture – Microprocessor or multichip or multimodule processor having...
Reexamination Certificate
2001-03-12
2003-02-04
Pan, Daniel H. (Department: 2186)
Electrical computers and digital processing systems: processing
Processing architecture
Microprocessor or multichip or multimodule processor having...
C712S201000, C712S225000, C710S305000, C711S217000
Reexamination Certificate
active
06516402
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to an information processing apparatus and, more particularly, to a flexible information processing apparatus capable of efficiently processing parallel accumulations and to an information processing apparatus capable of processing parallel accumulations of a variety of types.
2. Description of the Related Art
FIG. 13
is a block diagram showing a construction of an information processing apparatus according to the related art capable of processing parallel accumulations. Referring to
FIG. 13
, the information processing apparatus according to the related art comprises a memory
201
for storing data, a register A
202
for storing the data read from the memory
201
, an accumulator
203
for accumulating the data stored in the register A
202
, a register B
204
for storing results of accumulation performed by the accumulator
203
and a memory controller
205
for controlling an operation of reading from the memory
201
.
A description will now be given of the operation according to the related art.
FIG. 14
shows an example of how data is stored in the memory
201
. Referring to
FIG. 14
, data D
0
is stored at address 100h, data D
1
at address 101h, data D
2
at address 102h, data D
3
at address 103h, data D
4
at address 104h, data D
5
at address 105h, data Y
2
at address 200h, data Y
5
at address 201h, and data Y
8
at address 202h.
FIGS. 15A-15E
are timing charts showing how the operation of the information processing apparatus according to the related art is timed.
FIGS. 15A-15E
show that each step of the operation occurs at a rising edge of a clock. From the memory
201
, data D
0
at address 100h is stored in the register A
202
at T
1
, data D
1
at address 101h is stored at T
2
and data D
2
at address 102h is stored at T
3
. The register B
204
is initialized to 0 at T
1
. At T
2
, data D
0
in the register A
202
and the data in the register B
204
are accumulated by the accumulator
203
so that a result of accumulation D
0
+0 is stored in the register B
204
.
Accumulation and storage in the register B
204
are repeated two additional times (see
FIGS. 15C and 15D
) so that data Y
2
, a final result of accumulation stored in the register B
204
, is written at T
5
to the memory
201
at address 200h shown in FIG.
14
. At T
10
, data Y
5
stored in the register B
204
, a result of accumulation resulting from a subsequent cycle of accumulation involving three steps, is written to the memory
201
at address 201h shown in FIG.
14
.
According to the related-art information processing apparatus as described above, a redetermined number of steps of reading of data from the memory
201
and a predetermined number of steps of accumulation in the accumulator
203
proceed in parallel. Thereby, the processing time is reduced. The initialization of the accumulator
203
and the writing of the result of accumulation to the memory
201
, however, are processed separately. As a result, when an accumulation of three data items is repeated twice, for example, a total of 10 cycles T
1
through T
10
are required.
FIG. 16
is a block diagram showing a construction of another related-art information processing apparatus with the parallel accumulation capability disclosed in Japanese Laid-Open Patent Application No. 10-214261. Referring to
FIG. 16
, the information processing apparatus comprises a source data memory
501
, an automatic consecutive address generator
502
and a register A
505
for storing the source data. The automatic consecutive address generator
502
is used to store the source data from the source data memory
501
in the register A
505
using consecutive cycles. The apparatus further comprises a coefficient data memory
511
, an automatic consecutive address generator
512
and a register C
506
for storing the coefficient data. The automatic consecutive address generator
512
is used to store the coefficient data from the coefficient data memory
511
in the register C
506
using consecutive cycles.
Referring also to
FIG. 16
, the apparatus further comprises a pipeline operation unit
507
producing a product of the source data stored in the register A
505
and the coefficient data stored in the register C
506
. A register D
513
stores a result of operation performed by the pipeline operation unit
507
. An accumulator
508
accumulates results of operation stored in the register D
513
. An initializer
508
initializes a result of accumulation in the accumulator
508
. A register B
509
stores the result of accumulation from the accumulator
508
. The apparatus also includes a destination data memory
504
.and an automatic consecutive address generator
503
. The automatic consecutive address generator
503
is used to transfer the result of operation in the register B
509
to the destination data memory
504
.
FIGS. 17A-17I
are timing charts showing how the operation of the information processing apparatus according to the second related art described above is timed.
FIGS. 17A-17I
show that each step of the operation occurs at a rising edge of a clock. From the memory
501
, data D
0
is stored in the register A
505
at T
1
, data D
1
is stored at T
2
and data D
2
is stored at T
3
. From the coefficient data memory
511
, data C
0
is stored in the register C
506
at T
1
, data C
1
is stored at T
2
and data C
2
is stored at T
3
.
At T
2
, the pipeline operation unit
507
multiplies the data in the register A
505
by the data in the register C
506
. A result of operation Z
0
, i.e. D
0
*C
0
, is stored in the register D
513
. At T
3
, an initializing signal is at LOW so that the accumulator
508
produces an arithmetic sum of 0 and the data in the register D
513
so as to store a result of accumulation Y
0
, i.e. Z
0
+0, in the register B
509
. Alternatively, when the initializing signal is at HIGH (at T
4
, for example) the accumulator
508
produces an arithmetic sum of the data in the register D
513
and the data in the register B
509
so as to store the result of accumulation Y
1
, i.e. Z
1
+Y
0
, in the register B
509
. The step of accumulation is repeated three times. At T
6
, data Y
2
, a result of accumulation stored in the register B
509
, is written to the destination data memory
504
at memory address 0h.
The process described above is repeated until, at T
9
, data Y
3
, a result of accumulation for a second cycle of accumulation, is written to the destination data memory
504
at memory address 1h. Thus, a repetition including two cycles of accumulation of three data items requires a total of 9 cycles T
1
through T
9
. Excluding the pipeline operation, the first and second related-art apparatuses discussed are directed to a similar operation. A difference is that the second related-art apparatus provides an improvement in the processing efficiency by requiring only a total of 8 cycles.
To summarize, in the information processing apparatus according to the second related art discussed, the reading of the source data from the source data memory
501
, the reading of the coefficient data from the coefficient data memory
511
, the operation in the pipeline operation unit
507
and the accumulation in the accumulator
508
proceed in parallel such that predetermined number of each of these steps occur simultaneously. Additionally, the initialization of the result of accumulation performed by the accumulator
508
, the series of accumulation and the writing of the result of operation to the destination memory
504
proceed in parallel such that predetermined number of each of these steps occur simultaneously. Thereby, the processing time for successive accumulations is reduced.
A disadvantage with the information processing apparatus according to the first related art is that, for each cycle of accumulation, the initialization of the accumulator
203
and the transfer of the result of accumulation to the memory
201
are required. As a result, the overall processing time is relatively l
Kamemaru Toshihisa
Ogawa Yoshihiro
Suzuki Hirokazu
LandOfFree
Information processing apparatus with parallel accumulation... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information processing apparatus with parallel accumulation..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information processing apparatus with parallel accumulation... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3177160