System for internode deadlock avoidance in parallel database...

Electrical computers and digital processing systems: multicomput – Computer-to-computer protocol implementing – Computer-to-computer data transfer regulating

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S230000, C709S231000, C709S232000, C709S233000, C709S234000, C712S028000

Reexamination Certificate

active

06230206

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to the processing of data in a parallel processing system and more particularly to methods and means of avoiding deadlocks in processing information caused by the overloading of one or more pathways between nodes involved in transferring portions of the information required for processing the information. The invention is particularly applicable to the processing of queries in a parallel database system in order to avoid deadlocking caused by the blockage of portions of the queries or data sought to be transferred between nodes of the system.
BACKGROUND OF THE INVENTION
While parallel processing offers significant advantages in the high speed processing of large volumes of information by routing and processing information on nodes of the system simultaneously, the processing can be deadlocked if some of the information is prevented from reaching its intended node or nodes preventing it from being applied with other information on the system or in proper sequence. This can happen if one or more of the internodal communication pathways that link the nodes of the parallel processing system are operating at capacity limiting the flow of information through them.
More particularly, during the processing of SQL statements by a shared-nothing, MPP (Massively Parallel Processor) database system, data must be moved between the computers (or nodes) that make up the database manager. The limited buffering available to communication lines connecting nodes combined with data skew between the nodes resulting from delayed information transmission between the nodes can result in a deadlock when the order of data records being processed must be preserved.
Previously these deadlocks were handled by changing the execution plan for the SQL statement. This requires developing techniques to recognize that a given plan may deadlock and then changing the plan to avoid the deadlock. In this approach the data processing plan is optimized by an SQL optimizer to generate the best performing plan, from the optimizer's viewpoint without taking into account the possibility of deadlocking, then changing the plan if necessary to avoid deadlocking.
As queries become more complex and SQL optimizers become more sophisticated, the detection of potential deadlocks becomes very difficult and the modified execution plan may not perform as efficiently as possible.
One reason behind the deadlocking problem is the limited in-memory buffering available to an internodal communication line. Theoretically this could be handled by increasing the number of communication buffers available, so that the database system can handle more extreme situations of data skew. Ultimately, by increasing the buffer size to a large enough size no deadlocks would occur. However, reality dictates limitations to buffer size.
While the problem of internodal deadlocking has been of concern since the advent of parallel processing systems it does not appear that a broadly applicable satisfactory solution to this problem has been found until the present invention was developed.
SUMMARY OF THE INVENTION
At present a database system usually can only devote a small amount of memory to communication buffers so it was important to find another solution if possible. It was found that if properly applied as described below, that other storage means such as disk storage could be used to expand the buffering capacity by providing a temporary overflow buffer. Databases, employ temporary tables for the handling of transient data. It has been found that these temporary tables can be employed to act as temporary overflow buffers for the communication buffer space used by internodal communication lines.
In the normal operation of internodal transfers when data is sent from one node of the database system to another node, the data being sent is normally placed in an in-memory communication buffer by the database management system and the buffer is (logically) sent to the other node, ie. made available to it.
While the manner in which buffers are established and used in different parallel processing database systems may differ it is believed to be useful to discuss one example. One particular manner in which buffers can be used to transfer information between nodes follows. A sending node requiring a buffer may ask the communication manager of the system which controls the communication to and from the sending node to assign a buffer for its use. The data from the sending node is then loaded into the buffer. The communication manager is instructed where the data is to be sent (ie. To which recipient node). The communication manager assumes control of the buffer and sends the data which was contained within it over a physical communication media to the recipient node. The communication manager at the recipient will receive the data, transferring it to a buffer for subsequent use by the recipient node. It should be reiterated that there are a number of ways in which internodal transmission of information can occur in conjunction with the use of buffers. It appears that this invention is not dependent on the particular implementation used by the communication system in use.
Returning to a description of the invention herein; if the communication line is congested, ie. loaded to capacity, an in-memory communication buffer will not be available. In this situation the data is written to a temporary table. Later, when the communication line is free, the data can be read from the temporary table and sent to the other node as originally intended.
As a parallel data processing system is designed to handle a large flow of data it is possible that, when an internodal communication line becomes free, there will be both data in the temporary table and new data that is ready to be sent. As is frequently the case, there is likely to be an ordering to the data, so that it must be received or processed in a particular sequence, for example, when then the oldest data in the temporary table requires sending before the new data it is useful to add the new data to the end of the table to preserve the data sequence if required. Then, once the oldest data is drawn from the temporary table the new data can be accessed.
This approach has been found to resolve the problem of inter-nodal deadlocks since the communication buffer is effectively increased to the size of the (disk) space allocated for temporary tables. In the case of extreme data skew (where there is insufficient disk space for the overflow buffers), the database system will likely report an out of disk space message rather than becoming deadlocked. This is preferable to a deadlock situation as the system administrator or operator of the data processing system will be notified by the database management system of the out of disk space condition, whereas the occurrence of a deadlock situation may not be readily apparent.
It is preferable to improve on the above described implementation of the invention to prevent large amounts of data from being written to temporary tables each time data is sent between nodes as follows:
In the situation where a sending node has more than one recipient node to which data is being sent the sending node should preferably use (ie. write to) the temporary table overflow buffer if one of the nodes is waiting for information to be sent. For example if nodes A and B are to be sent data by node S, but the communication line to A is congested data does not need to be written to the temporary file unless node B is waiting for information. If neither of the nodes needs data the sender S can wait until the communication line becomes available or another node becomes ready for data.
In yet another implementation a timer can be employed to delay writing data to the temporary table. This has proven to be useful as a communication line may be congested because the intended recipient node is performing work rather than reading from the communication line. An appropriate brief delay gives the recipient node a chance to read data from the line, allowing the sendin

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for internode deadlock avoidance in parallel database... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for internode deadlock avoidance in parallel database..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for internode deadlock avoidance in parallel database... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2469791

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.