Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing
Reexamination Certificate
1999-07-01
2003-11-18
Voeltz, Emanuel Todd (Department: 2121)
Electrical computers and digital processing systems: multicomput
Computer-to-computer data routing
Least weight routing
Reexamination Certificate
active
06651082
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to dynamic change of the load-balance in a multiprocessor system, particularly to dynamic change of the load balance between a host processor and a graphics adapter in a computer.
BACKGROUND OF THE INVENTION
As regards polygon-based three-dimensional graphics, such as OpenGL and Direct3D, main factors which determine entire performance are as follows:
(1) API-the speed at which a graphics command is issued via API from an application;
(2) Geometry processing-the speed of geometry processing such as triangulation/coordinate transformations/lighting calculation;
(3) Setup processing-the speed of gradient calculation of color value/Z coordinate value/texture coordinate value along with face/edge of the triangle; and
(4) raster processing-the speed of generating pixels which are obtained by interpolation of color values, Z coordinate value, and texture coordinate value, and reading/writing them into a frame buffer.
The first listed factor, the API, does not present a problem since, even if a method is used whereby the API is called for each vertex (which is the worst case), it only takes a few tens of clocks per vertex.
Raster processing corresponds to how many. pixels can be drawn per second (pixel-rate). This pixel rate has nothing to do with a polygon rate (mentioned later), and a required amount is determined by screen size (how many pixels, for instance, 640×480 or 1024×768, a screen is composed of), frame rate (how many frames are displayed per second, which is different from a CRT refresh rate and is generally around 12-60 frames/second), and average overlapping on the screen (normally three times or so). For recently developed graphics adapters, raster processing presents almost no difficulty up to a screen size such as that of SXGA (1280×1024 pixels).
Performance of geometry and setup processing, (2) and (3), directly corresponds to the number of polygons which can be processed per second (the aforementioned polygon rate). As setup processing is often considered a part of geometry processing, it is regarded as geometry processing here. Geometry processing requires lots of floating-point arithmetic. It takes a few hundred to a few thousand clocks for processing per vertex. Therefore, the throughput of a host processor alone is often insufficient. For instance, when processing 10M vertexes per, second, where 1,000 clocks are required to process each vertex, a processor which works at 10G clocks/second will be necessary. Thus, there are many cases where a functional unit dedicated to geometry processing is set on a graphics adapter. Also, the work load greatly varies depending on conditions of processing, such as number and types of light sources.
Meanwhile, a host processor stores a sequence of graphics commands in main storage device. This sequence of graphics commands is called a command queue. A graphics adapter obtains contents of a command queue by using DMA, followed by processing them and displaying them on a display device. This command queue must physically exist in main storage device or on a graphics adapter for the necessity of performing DMA transfer. Thus, the size of a command queue is limited. If this command queue becomes full or empty in the course of processing, the host processor or the graphics adapter stops so that the entire performance deteriorates. If the command queue is full, the host processor cannot write to the command queue any more, therefore it cannot go on to processing until there is a space in it. Also, if the command queue is empty, the graphics adapter cannot perform processing.
While a command queue does not become full or empty if the processing speed of the host processor and that of the graphics adapter are equal, it has not been possible to make both processing speeds equal for the following reasons:
(a) it is difficult to estimate throughput of a host processor available for graphics processing, since the type/operating frequency
umber of host processors are various, and the load of a host processor which is available for uses other than graphics processing is difficult to estimate and changes dynamically;
(b) as in the case of the above-mentioned geometry processing, the work load of a graphics command on a host processor is difficult to estimate since it changes dynamically depending on a current state or data (for instance, the number of vertexes increase or decrease by clipping); and
(c) the work load of a graphics command on a graphics adapter is difficult to estimate since it changes dynamically depending on the current state or data.
Assuming that the throughput and work load of a host processor are P
h
, L
h
respectively and the throughput and work load of a graphics adapter are P
a
, L
a
respectively, processing can go on without a command queue becoming empty or full if L
h
/P
h
=L
a
/P
a
holds. However, L
h
, P
h
, L
a
and P
a
are all inestimable and the system's performance could not always be fully exploited.
Japanese Published Unexamined Patent Application No. Hei 2-275581 discloses a technology for improving processing speed of the entire system, if the necessary time for using each function is known in advance, by changing the load on a plurality of processors every time a user switches on/off several functions which he or she uses. However, partial load shifting cannot be appropriately changed when the necessary time for performing a function depends on the data to be processed. Moreover, a host processor is often in a multitasking OS environment and the computational ability assigned to graphics changes every moment, which is also a point where the prerequisite of this patent (i.e., knowing the necessary time) is not appropriate. In addition, this patent requires a table of partial charges corresponding to all combinations of on/off of functions to be made, though such is not practical since the number of functions to be switched on/off is enormous in an actual environment.
Thus, an object of the present invention is to provide a computer system wherein L
h
/P
h
≅L
a
/P
a
in an environment where L
h
, P
h
, L
a
and P
a
are all unpredictable.
Another object is to enable the entire system's performance to be best exploited by bringing it close to L
h
/P
h
=L
a
/P
a
.
A further object is to allow adaptation to improved throughput of a future host processor thereby extending product life.
Still another object is, even when a command queue becomes full, to keep a host processor from stopping so that the entire system's performance does not deteriorate.
SUMMARY OF THE INVENTION
The foregoing and other objects are realized by the present invention which dynamically changes a partial charge, or assignment of processes, of each group in a sequence of processes from a first stage to an n-th stage in a computer having a plurality of processors, wherein said plurality of processors are grouped into at least two groups. The invention includes the steps of: detecting a change in a characteristic value in a queue for transferring a processing result between the groups; and changing the partial charge of each group based on the increase or decrease of the characteristic value. A characteristic value of data stored in a queue represents a value related to work load, and the queue seldom becomes full or empty if the load balance is changed by referring to this characteristic value. For instance, the characteristic value can be either the amount of information stored in a queue, the size (length) of a queue, or the number of vertex data stored in a queue in the case of processing related to graphics.
The aforementioned changing step may also comprise steps of determining if the characteristic value has increased by a predetermined threshold value or more and setting the charge of a group which performs processes up to an i-th stage (1≦i<n), where the i-th stage is a boundary between partial charges of the groups, to processes up to a stage following the i-th stage. A process of a stage following the i-th stage means a p
Kawase Kei
Moriyama Takao
Nakamura Fusashi
Cameron Douglas W.
Dougherty Anne Vachon
International Business Machines - Corporation
Todd Voeltz Emanuel
LandOfFree
Method for dynamically changing load balance and computer does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for dynamically changing load balance and computer, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for dynamically changing load balance and computer will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3150581