Electrical computers and digital processing systems: processing – Processing control – Arithmetic operation instruction processing
Reexamination Certificate
1998-09-28
2001-02-27
An, Meng-Ai T. (Department: 2783)
Electrical computers and digital processing systems: processing
Processing control
Arithmetic operation instruction processing
C712S007000, C712S002000
Reexamination Certificate
active
06195747
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to data processing system architecture technology. More specifically, the present invention relates to a data switching device with a bandwidth management unit to reduce system data traffic between the processor and the system controller in a data processing system while performing vector-calculation operations, such as vector product operations, and the processing method employed by the data switching device.
2. Description of the Related Art
The primary value of data processing systems resides in their computing power. This computing power is useful in engineering, statistics, scientific research, and many other fields. For example, engineers use computing power to solve high-order polynomial equations, or to simulate the stress (force) distribution of an aircraft or a sailing vessel. Because most applications require a large number of computing steps, data processing systems need to quickly retrieve data to be processed and output the result of the operation. Therefore, the efficiency of data transfer is a critical factor in computing performance.
FIG. 1
 (Prior Art) is a block diagram of a part of a typical data processing system, such as a computer system. 
FIG. 1
 shows only the components of the data processing system that are required to perform a mathematical operation. As shown in 
FIG. 1
, the data processing system comprises processor 
10
, system controller 
20
, main memory 
30
, peripheral device(s) 
40
 and cache memory 
50
. Co-processor 
10
a 
is an optional component, which is used to help processor 
10
 perform special mathematical operations, such as floating-point operations. The functions of these components are described as follows.
Processor 
10
 is the processing center of the data processing system, which receives instructions and sequentially executes them. In addition, processor 
10
 usually includes several embedded registers (not shown) that store the data to be processed and the operation result, and which serve to reduce the number of times it is necessary to communicate with external data sources. System bus 
60
 is connected between processor 
10
 and system controller 
20
.
System controller 
20
 is a bridge device for interfacing between processor 
10
 and other components in the data processing system, such as main memory 
30
 and peripheral devices 
40
. The main functions of system controller 
20
 are to manage the main memory (typically implemented by Dynamic Random Access Memories, or DRAM) and to interface between the system bus and a peripheral bus (such as a Peripheral Component Interface bus, or PCI). Briefly speaking, the memory management function of system controller 
20
 comprises transferring information, such as program code and data code, between processor 
10
 and main memory 
30
. In addition, system controller 
20
 controls peripheral devices 
40
, such as the input/output devices. For example, a multimedia system of peripheral devices 
40
 displays the result of the desired operation. The interface function of system controller 
20
 is irrelevant to the issue of the present invention and will not be further discussed.
Cache memory 
50
 and optional co-processor 
10
a
, both of which are located in proximity to processor 
10
, provide processor 
10
 with additional assistance. Cache memory 
50
, typically implemented by Static Random Access Memories (SRAM), serves as a buffer space for temporarily storing the input/output data of processor 
10
. As described above, processor 
10
 includes only a limited number of embedded registers and therefore cannot pre-load all the program code that is ready to be executed. If processor 
10
 were required to load the program/data code instruction-by-instruction at the time of execution, it is clear that the computing speed of processor 
10
 would decrease. Using cache memory 
50
 as a buffer allows processor 
10
 to execute instructions without the interruptions resulting from accessing external program/data code.
Co-processor 
10
a
, as described above, provides additional calculation functions that are not implemented by hardware in processor 
10
. For example, some co-processors provide processors with floating-point calculation functions, which otherwise would be fulfilled by software. Basically, co-processor 
10
a 
operates under the control of processor 
10
, (i.e. co-processor 
10
a 
receives operation code and data code related to the floating-point operation from processor 
10
), and cannot work independently. Today, many of the additional functions previously provided by co-processors have been merged into processors. Nevertheless, the modern multi-processor system is similar in architecture to that of a processor/co-processor system, although more complicated.
According to the above description, the process for performing a mathematical operation in the data processing system as shown in 
FIG. 1
 is briefly described as follows. In the following example, the operands (data ready to be processed) are stored in main memory 
30
. After receiving an instruction for adding operand X with operand Y. processor 
10
 issues a read request for reading the data X and Y to system controller 
20
 through system bus 
60
. System controller 
20
 reads out the data X and Y stored in main memory 
30
 in response to the read request received from processor 
10
 and sends the data back to processor 
10
 through system bus 
60
. After finishing the addition operation, processor 
10
 then issues a write request for writing the addition result to main memory 
30
. This write request is also transferred by system bus 
60
. Finally, system controller 
20
 receives the write request and writes the addition result to a destination location in main memory 
30
. The addition operation is completed.
It is evident that system bus 
60
 is quite busy. In the above calculation, processor 
10
 issues, through system bus 
60
, the read request containing the addressing information of operands X and Y, and the write request containing the result data and the addressing information of the result data. In fact, the data traffic of system bus 
60
 is heavier than that of other buses. As described above, system controller 
20
 is electrically coupled to, and transfers data between, processor 
10
, main memory 
30
, PCI bus and graphic subsystem 
40
. Therefore, data from various sources that is ready to be processed is transferred to processor 
10
 through system bus 
60
, thereby increasing the data traffic on system bus 
60
. One could describe system bus 
60
 as a bottleneck in the system performance. Many methods have been proposed to solve this problem. For example, the data processing system can use the Direct Memory Access (DMA) technique to bypass the graphic data required in the display system, and add a controller to directly control the operation of the peripheral devices. However, information associated with mathematical operations must pass through system bus 
60
 (in order to be executed by processor 
10
) and cannot be rerouted to other components. Mathematical operations requiring a lot of data, such as vector or matrix operations, have an especially great impact on the traffic load of system bus 
60
.
FIG. 2
 (Prior Art) is a data flow diagram showing the flow of data between processor 
10
, system controller 
20
 and main memory 
30
 during a vector multiplication operation. In 
FIG. 2
, the data (request or control signals) sequence is denoted by symbols 
1
a 
through 
1
k
. 
FIG. 2
 only depicts the components relevant to this calculation process, i.e. processor 
10
, system controller 
20
 and main memory 
30
.
The operation illustrated in 
FIG. 2
 is a calculation of the inner product of vector X and vector Y (that is, X·Y), wherein X=(x
1
, x
2
, . . . , x
n
), Y=(y
1
, y
2
, . . . , y
n
) and n represents the dimensions of vectors X and Y. As shown in 
FIG. 2
, a vector-calculation instruction 
1
a
, which indicates the operation of X·Y, is first sent to processor 
10
. After accept
An Meng-Ai T.
Chang Jung-won
Mentor Arc Inc.
Townsend and Townsend / and Crew LLP
LandOfFree
System and method for reducing data traffic between a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for reducing data traffic between a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for reducing data traffic between a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2576069