Electrical computers and digital processing systems: processing – Processing architecture – Distributed processing system
Patent
1997-07-15
2000-04-11
Coleman, Eric
Electrical computers and digital processing systems: processing
Processing architecture
Distributed processing system
712 21, G06F 1200
Patent
active
060498617
ABSTRACT:
A method is disclosed for reproducible sampling of data items of a dataset which is shared across a plurality of nodes of a parallel data processing system.
In data mining of large databases, segmentation of the database is often necessary either to obtain a summary of the database or prior to an operation such as link analysis. A sample of data records are taken to create an initial segmentation model. The records of this sample and the initial model created from them can be critical to the results of the data mining process, and the initial model may not be reproducible unless the same sampling of data records is repeatable. Reproducible sampling is enabled without polling of all nodes to locate particular records. Parametric control information with a small number of control parameters is generated which describes the particular partitioning of the dataset. The parametric control information enables computing of the location of a data record. The parametric control information may be distributed to each node and enable computing of the location of data records by each node. The invention is applicable to other sampling methods.
REFERENCES:
patent: 5408652 (1995-04-01), Hayashi
patent: 5625832 (1997-04-01), Ohsawa
patent: 5682535 (1997-10-01), Knudsen
patent: 5867649 (1999-02-01), Larson
patent: 5920702 (1999-07-01), Bleidt
patent: 5963212 (1999-10-01), Bakalah
Bird Colin Leonard
Wallis Graham Derek
Coleman Eric
Farrell Timothy M.
International Business Machines - Corporation
Kappos David J.
LandOfFree
Locating and sampling of data in parallel processing systems does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Locating and sampling of data in parallel processing systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Locating and sampling of data in parallel processing systems will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1184726