System and method for transferring partitioned data sets...

Electrical computers and digital processing systems: multicomput – Computer-to-computer protocol implementing – Computer-to-computer data transfer regulating

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06691166

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates, generally, to database retrieval systems and, more particularly, to the partitioning of large data sets for transfer over multi-threaded strands in a parallel load fashion.
Databases represent an important data processing area in many applications. They are often used to store large amounts of information and then efficiently retrieve selected portions of the information. Many present day systems use a client/server database approach in which application programs running in clients access a central database management system (DBMS) located in a database server. In order to access the information in the database efficiently, the clients form queries that request information with selected characteristics. The queries are transmitted to the DBMS server, which retrieves the desired information that meets the characteristics specified in the query. The results (commonly called a “result set”) are returned to the clients.
The ability to form queries to request information within the database allows an individual user to query the database for desired information. At other times, however, it is desirable to be able to transfer large blocks of data within the database to a given client or from a client to a given server. The transfer of large blocks of data within a database management system from a client to a server typically can be performed over several possible types of connection systems. One type of system is a simple telephone line hookup, such as a hookup using a modem, where the server dials up the client and proceeds to query the client for the block of data and then have the data transferred via the modem connection, to the server. This system provides a serial data transfer approach and, in view of the limited transfer rates currently achievable in telephonic data communications, can be quite slow.
A second type of transfer can be performed over a networked system within a private network, also known as an intranet, where the client and server are networked together to allow relatively instant communication among the networked members. A communication node connects the client to the server within the network system to form a thread. This system typically is much faster than the modem line of telephonic communication because of the high transfer rates obtainable within a networked system. This “single threaded,” or serial transfer, approach to transferring large blocks of data to a requesting server and can still be time consuming during the actual transfer of data because of the serial nature of the transfer.
Another system that provides a database and access to the database to a plurality of users is a wide area network (WAN), which is also within a intranet system. A WAN typically includes multiple communication nodes that allow for multiple connections from one source to another. Rarely, however, multiple connections are utilized by applications to communicate between a client and a server within WAN. Since the full capacity of the network is not fully utilized by the application, the transfer rate is still slow. There is no known method or system that allows a large data set, such as a large portion, if not the entire database, to be transmitted over the plurality of nodes in a parallel manner.
Therefore, there is a need to be able to transfer large data sets across a communication system that has multiple nodes via the multiple nodes. What is needed then is a system or method that enables data to be transferred over multiple nodes in a parallel fashion.
SUMMARY OF THE INVENTION
According to the present invention, a method and system for breaking a data set into a plurality of partitions for transfer over a plurality of communication nodes in parallel. The system comprises a plurality of communication nodes that are coupled to the first and second computer storage locations. Each of the plurality of communication nodes serves as a communication thread. A data transfer controller is coupled to the first computer storage location and the plurality of communication nodes. The data transfer controller selects a number of communication threads to serve as data transfer links between the first and second computer storage location. A data partitioner is also provided that is coupled to the first computer storage location. The data partitioner is responsive to a data set transfer request to partition the data set into transferable data set partitions. The data transfer controller transfers the data set partitions over the communication threads in parallel once the partitions are ready.
In a first embodiment, the data set is partitioned into a number of data set partitions that is equal to the number of selected communication threads. This allows a one-to-one correspondence of data set partitions to the communication threads.
In a second embodiment, the data set is partitioned into a number of data set partitions greater than the number of selected communication threads and are processed not only in parallel over the multiple threads, but also in serial where the faster communication threads are able to handle the added data partitions.
An additional feature of the distributive computer environment is the ability of the data transfer controller to allow access to a transferred data set partition prior to the remaining data set partitions being completely transferred and then reassembled in the original data set. This allows a user to access the data within any data set partition that has been completely transferred prior to actual assembly of the original data set. During the data partitioning, the data partitioner defines each partition based on the number of data rows used within the data set. The data partitioner also divides the data rows between the partitions as evenly as possible without fragmenting any of the data rows between discrete partitions.
The method is utilized within a distributive computer environment that comprises a first computer storage location, a second computer storage location, and a plurality of communication nodes coupling the first and second computer storage locations. The method provides for transferring a data set between the first and second computer storage locations and proceeds as follows. A request is made for a data set, or portion of a data set, to be transferred between the first and second computer storage locations. Once the data set is determined, the process partitions the data set into transferable data set partitions. The process forms a plurality of concurrent communication threads selected from the plurality of communication nodes. Once the partitions have been formed and the communication threads established, the process transfers the data set partitions in parallel over the plurality of communication threads between the first and second computer storage locations. When forming the communication threads, the number of communication threads may be set equal to the number of data set partitions. Likewise, the number of data set partitions may exceed the actual number of communication threads available.
Where the number of data set partitions exceeds the number of communication threads, one partition is transferred over each communication thread in a parallel transfer. As the data set transfer in a partition is completed and the transferring thread becomes available, a next data set partition transfer is performed over the now available communication thread. The ability to perform both parallel and serial transfers of the data set partitions allows for the method to optimize the transfer rate of the partitioned data set based on the number of communication threads available and the varying transfer rates for each communication thread.
The method also allows a user or system to gain access to completely transferred data set partitions prior to the complete transfer of all the data set partitions and their reassembly into the original data set. The data set is comprised of data rows having a total range so that a partition column to mark each data set partition is establish

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for transferring partitioned data sets... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for transferring partitioned data sets..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for transferring partitioned data sets... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3351944

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.