Electrical computers and digital processing systems: processing – Processing control – Arithmetic operation instruction processing
Reexamination Certificate
1998-09-28
2001-02-27
An, Meng-Ai T. (Department: 2783)
Electrical computers and digital processing systems: processing
Processing control
Arithmetic operation instruction processing
C712S007000, C712S002000
Reexamination Certificate
active
06195747
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to data processing system architecture technology. More specifically, the present invention relates to a data switching device with a bandwidth management unit to reduce system data traffic between the processor and the system controller in a data processing system while performing vector-calculation operations, such as vector product operations, and the processing method employed by the data switching device.
2. Description of the Related Art
The primary value of data processing systems resides in their computing power. This computing power is useful in engineering, statistics, scientific research, and many other fields. For example, engineers use computing power to solve high-order polynomial equations, or to simulate the stress (force) distribution of an aircraft or a sailing vessel. Because most applications require a large number of computing steps, data processing systems need to quickly retrieve data to be processed and output the result of the operation. Therefore, the efficiency of data transfer is a critical factor in computing performance.
FIG. 1
(Prior Art) is a block diagram of a part of a typical data processing system, such as a computer system.
FIG. 1
shows only the components of the data processing system that are required to perform a mathematical operation. As shown in
FIG. 1
, the data processing system comprises processor
10
, system controller
20
, main memory
30
, peripheral device(s)
40
and cache memory
50
. Co-processor
10
a
is an optional component, which is used to help processor
10
perform special mathematical operations, such as floating-point operations. The functions of these components are described as follows.
Processor
10
is the processing center of the data processing system, which receives instructions and sequentially executes them. In addition, processor
10
usually includes several embedded registers (not shown) that store the data to be processed and the operation result, and which serve to reduce the number of times it is necessary to communicate with external data sources. System bus
60
is connected between processor
10
and system controller
20
.
System controller
20
is a bridge device for interfacing between processor
10
and other components in the data processing system, such as main memory
30
and peripheral devices
40
. The main functions of system controller
20
are to manage the main memory (typically implemented by Dynamic Random Access Memories, or DRAM) and to interface between the system bus and a peripheral bus (such as a Peripheral Component Interface bus, or PCI). Briefly speaking, the memory management function of system controller
20
comprises transferring information, such as program code and data code, between processor
10
and main memory
30
. In addition, system controller
20
controls peripheral devices
40
, such as the input/output devices. For example, a multimedia system of peripheral devices
40
displays the result of the desired operation. The interface function of system controller
20
is irrelevant to the issue of the present invention and will not be further discussed.
Cache memory
50
and optional co-processor
10
a
, both of which are located in proximity to processor
10
, provide processor
10
with additional assistance. Cache memory
50
, typically implemented by Static Random Access Memories (SRAM), serves as a buffer space for temporarily storing the input/output data of processor
10
. As described above, processor
10
includes only a limited number of embedded registers and therefore cannot pre-load all the program code that is ready to be executed. If processor
10
were required to load the program/data code instruction-by-instruction at the time of execution, it is clear that the computing speed of processor
10
would decrease. Using cache memory
50
as a buffer allows processor
10
to execute instructions without the interruptions resulting from accessing external program/data code.
Co-processor
10
a
, as described above, provides additional calculation functions that are not implemented by hardware in processor
10
. For example, some co-processors provide processors with floating-point calculation functions, which otherwise would be fulfilled by software. Basically, co-processor
10
a
operates under the control of processor
10
, (i.e. co-processor
10
a
receives operation code and data code related to the floating-point operation from processor
10
), and cannot work independently. Today, many of the additional functions previously provided by co-processors have been merged into processors. Nevertheless, the modern multi-processor system is similar in architecture to that of a processor/co-processor system, although more complicated.
According to the above description, the process for performing a mathematical operation in the data processing system as shown in
FIG. 1
is briefly described as follows. In the following example, the operands (data ready to be processed) are stored in main memory
30
. After receiving an instruction for adding operand X with operand Y. processor
10
issues a read request for reading the data X and Y to system controller
20
through system bus
60
. System controller
20
reads out the data X and Y stored in main memory
30
in response to the read request received from processor
10
and sends the data back to processor
10
through system bus
60
. After finishing the addition operation, processor
10
then issues a write request for writing the addition result to main memory
30
. This write request is also transferred by system bus
60
. Finally, system controller
20
receives the write request and writes the addition result to a destination location in main memory
30
. The addition operation is completed.
It is evident that system bus
60
is quite busy. In the above calculation, processor
10
issues, through system bus
60
, the read request containing the addressing information of operands X and Y, and the write request containing the result data and the addressing information of the result data. In fact, the data traffic of system bus
60
is heavier than that of other buses. As described above, system controller
20
is electrically coupled to, and transfers data between, processor
10
, main memory
30
, PCI bus and graphic subsystem
40
. Therefore, data from various sources that is ready to be processed is transferred to processor
10
through system bus
60
, thereby increasing the data traffic on system bus
60
. One could describe system bus
60
as a bottleneck in the system performance. Many methods have been proposed to solve this problem. For example, the data processing system can use the Direct Memory Access (DMA) technique to bypass the graphic data required in the display system, and add a controller to directly control the operation of the peripheral devices. However, information associated with mathematical operations must pass through system bus
60
(in order to be executed by processor
10
) and cannot be rerouted to other components. Mathematical operations requiring a lot of data, such as vector or matrix operations, have an especially great impact on the traffic load of system bus
60
.
FIG. 2
(Prior Art) is a data flow diagram showing the flow of data between processor
10
, system controller
20
and main memory
30
during a vector multiplication operation. In
FIG. 2
, the data (request or control signals) sequence is denoted by symbols
1
a
through
1
k
.
FIG. 2
only depicts the components relevant to this calculation process, i.e. processor
10
, system controller
20
and main memory
30
.
The operation illustrated in
FIG. 2
is a calculation of the inner product of vector X and vector Y (that is, X·Y), wherein X=(x
1
, x
2
, . . . , x
n
), Y=(y
1
, y
2
, . . . , y
n
) and n represents the dimensions of vectors X and Y. As shown in
FIG. 2
, a vector-calculation instruction
1
a
, which indicates the operation of X·Y, is first sent to processor
10
. After accept
An Meng-Ai T.
Chang Jung-won
Mentor Arc Inc.
Townsend and Townsend / and Crew LLP
LandOfFree
System and method for reducing data traffic between a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for reducing data traffic between a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for reducing data traffic between a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2576069