Scalable, distributed, asynchronous data collection mechanism

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S241000, C709S201000, C709S232000

Reexamination Certificate

active

06374254

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention generally relates to collection of data from nodes in distributed networks and in particular to asynchronous collection of large blocks of data from distributed network nodes. Still more particularly, the present invention relates to a scalable, distributed data collection mechanism which efficiently supports large numbers of data collection endpoints and large return collection data sizes with optimized bandwidth utilization.
2. Description of the Related Art
Distributed applications which operate across a plurality of systems frequently require collection of data from the member systems. A distributed inventory management application, for example, must periodically collect inventory data for compilation from constituent systems tracking local inventory in order to accurately serve inventory requests.
Large deployments of distributed applications may include very large numbers of systems (e.g., than 10,000) generating data. Even if the amount of data collected from each system is relatively small, this may result in large return data flows. For instance, if each system within a 20,000 node distributed application generates only 50 KB of data for collection, the total data size is still approximately 1,000 MB.
Current synchronous approaches to data collection in distributed applications typically follow a “scan” methodology illustrated in FIG.
5
. In this approach, a centralized data collector (or “scan initiator”)
502
initiates the data collection by transmitting a set of instructions to each node or member system
504
a
-
504
n
through one or more intermediate systems
506
, which are typically little more than a relay providing communications between the central data collector
502
and the member systems
504
a
-
504
n
. The central data collector
502
must determine hardware and software configuration information for the member systems
504
a
-
504
n
, request the desired data from the member systems
504
a
-
504
n
, and receive return data via the intermediate system(s)
506
. The data received from the member systems
504
a
-
504
n
is then collated and converted, if necessary, and forwarded to a relational interface module (RIM)
508
, which serves as an interface for a relational database management system (RDBMS).
In addition to not being readily scalable, this approach generates substantial serial bottlenecks on both the scan and return side. Even with batching, the number of member systems which may be concurrently scanned must be limited to approximately 100 in order to limit memory usage. The approach also limits exploitable parallelism. Where a five minute scan is required, 20,000 nodes could all be scanned in just five minutes if the scans could be performed fully parallel. Even in batches of 100, the five minute scans would require 1,000 minutes to complete. The combination of the return data flow bottleneck and the loss of scan parallelism creates a very large latency, which is highly visible to the user(s) of the member systems.
Current approaches to data collection in distributed applications also employ Common Object Request Broker Architecture (CORBA) method parameters for returning results to the scan initiator
502
. This is inefficient for larger data sizes, which are likely to be required in data collection for certain information types such inventory or retail customer point-of-sale data.
Still another problem with the existing approach to data collection is that nodes from which data must be collected may be mobile systems or systems which may be shut down by the user. As a result, certain nodes may not be accessible to the scan initiator
502
when data collection is initiated.
It would be desirable, therefore, to provide a scalable, efficient data collection mechanism for a distributed environment having a large number of nodes and transferring large blocks of data. It would further be advantageous for the system to accommodate data collection from nodes which may be periodically or intermittently inaccessible to the collection point.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide improved collection of data from nodes in distributed networks.
It is another object of the present invention to provide asynchronous collection of large blocks of data from distributed network nodes.
It is yet another object of the present invention to provide a scalable, distributed data collection mechanism which efficiently supports large numbers of data collection endpoints and large return collection data sizes with optimized network bandwidth utilization.
The foregoing objects are achieved as is now described. The “scan” phase of a distributed data collection process is decoupled from upload of the return collection data, with the “scan” consisting merely of an infrequent profile push to configure autonomous scanners at the data collection endpoints. Distributed data collection is initiated by endpoints within the distributed network, which autonomously perform a scan and transmit a Collection Table of Contents (CTOC) data structure to a nearest available collector, then await a ready message from the collector. When ready to receive the return collection data, the collector signals the endpoint, which transfers the data collection in small packets to the collector. The collector stores the received data collection in persistent storage, then initiates collection to a higher collector or recipient in substantially the same manner as the endpoint. A routing manager controls the routing of data from endpoints through one or more collectors to the recipient. Scans for the data collection may thus be performed fully parallel, and upload of the collection data proceeds by direct channel under the control of the collectors. Bandwidth utilization for the data collection may thus be optimized for network loading by blackout periods and cooperation of the collectors with other distributed applications. The resulting distributed data collection mechanism is scalable, with large numbers of endpoints and large return collection data sizes being efficiently supported.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.


REFERENCES:
patent: 4232295 (1980-11-01), McConnell
patent: 5455948 (1995-10-01), Poole et al.
patent: 5778350 (1998-07-01), Adams et al.
patent: 5943621 (1999-08-01), Ho et al.
patent: 6195628 (2001-02-01), Blaauw et al.
patent: 6282175 (2001-08-01), Steele et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Scalable, distributed, asynchronous data collection mechanism does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Scalable, distributed, asynchronous data collection mechanism, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scalable, distributed, asynchronous data collection mechanism will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2819211

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.