Data collector for use in a scalable, distributed,...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06701324

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention generally relates to collectors for collection of data from nodes in distributed networks and in particular to collectors providing asynchronous collection of large blocks of data from distributed network nodes. Still more particularly, the present invention relates to collectors implementing a scalable, distributed data collection mechanism.
2. Description of the Related Art
Distributed applications which operate across a plurality of systems frequently require collection of data from the member systems. A distributed inventory management application, for example, must periodically collect inventory data for compilation from constituent systems tracking local inventory in order to accurately serve inventory requests.
Large deployments of distributed applications may include very large numbers of systems (e.g., than 10,000) generating data. Even if the amount of data collected from each system is relatively small, this may result in large return data flows. For instance, if each system within a 20,000 node distributed application generates only 50 KB of data for collection, the total data size is still approximately 1,000 MB.
Current synchronous approaches to data collection in distributed applications typically follow a “scan” methodology illustrated in FIG.
6
. In this approach, a centralized data collector (or “scan initiator”)
602
initiates the data collection by transmitting a set of instructions to each node or member system
604
a
-
604
n
through one or more intermediate systems
606
, which are typically little more than a relay providing communications between the central data collector
602
and the member systems
604
a
-
604
n
. The central data collector
602
must determine hardware and software configuration information for the member systems
604
a
-
604
n
, request the desired data from the member systems
604
a
-
604
n
, and receive return data via the intermediate system(s)
606
. The data received from the member systems
604
a
-
604
n
is then collated and converted, if necessary, and forwarded to a relational interface module (RIM)
608
, which serves as an interface for a relational database management system (RDBMS).
In addition to not being readily scalable, this approach generates substantial serial bottlenecks on both the scan and return side. Even with batching, the number of member systems which may be concurrently scanned must be limited to approximately 100 in order to limit memory usage. The approach also limits exploitable parallelism. Where a five minute scan is required, 20,000 nodes could all be scanned in just five minutes if the scans could be performed fully parallel. Even in batches of 100, the five minute scans would require 1,000 minutes to complete. The combination of the return data flow bottleneck and the loss of scan parallelism creates a very large latency, which is highly visible to the user(s) of the member systems.
Current approaches to data collection in distributed applications also employ Common Object Request Broker Architecture (CORBA) method parameters for returning results to the scan initiator
602
. This is inefficient for larger data sizes, which are likely to be required in data collection for certain information types such inventory or retail customer point-of-sale data.
Still another problem with the existing approach to data collection is that nodes from which data must be collected may be mobile systems or systems which may be shut down by the user. As a result, certain nodes may not be accessible to the scan initiator
602
when data collection is initiated.
It would be desirable, therefore, to provide a collector which may be utilized to implement a scalable, efficient data collection mechanism for a distributed environment. It would further be advantageous for the collectors to provide priority based queuing for collection requests, data rate matching to available bandwidth, and collection transfer control cooperating with other distributed applications for optimization of bandwidth utilization.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide collectors for collection of data from nodes in distributed networks.
It is another object of the present invention to provide collectors providing asynchronous collection of large blocks of data from distributed network nodes.
It is yet another object of the present invention to provide collectors implementing a scalable, distributed data collection mechanism.
The foregoing objects are achieved as is now described. A collector for distributed data collection includes input and output queues employed for priority based queuing and dispatch of data received from endpoints and downstream collector nodes. Collection Table of Contents (CTOC) data structures for collection data are received by the collector from the endpoints or downstream collectors and are placed in the input queue, then sorted by the priority within the CTOC. Within a given priority level, collection of the data is scheduled based on the activation time window within the CTOC, which specifies the period during which the endpoint or downstream collector node will be available to service data transfer requests. The collected data, in the form of data packs and constituent data segments, is stored in persistent storage (depot). A CTOC is then transmitted to the next upstream collector node. Network bandwidth utilization is managed by adjusting the activation time window specified within a CTOC and the route employed between source and recipient.


REFERENCES:
patent: 6374254 (2002-04-01), Cochran et al.
patent: 6418445 (2002-07-01), Moerbeek
patent: 6421676 (2002-07-01), Krishnamurthy et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data collector for use in a scalable, distributed,... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data collector for use in a scalable, distributed,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data collector for use in a scalable, distributed,... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3235601

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.