Collective data mining from distributed, vertically...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Collective data mining from distributed, vertically... Collective data mining from distributed, vertically...

: 2002-11-01
: 2004-03-16
: Robinson, Greta (Department: 2177)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: C707S793000, C707S793000
: Reexamination Certificate
: active
: 06708163
: ABSTRACT:

FIELD OF THE INVENTION
This invention relates in general to a network of databases and, in particular, to collective data mining from a distributed, vertically partitioned feature space.
BACKGROUND OF THE INVENTION
Distributed data mining (DDM) is a fast growing area that deals with the problem of finding data patterns in an environment with distributed data and computation. Although today most of the data analysis systems require centralized storage of data, the increasing merger of computation with communication is likely to demand data mining environments that can exploit the full benefit of distributed computation. For example, consider the following cases.
1. Example I: Imagine an epidemiologist, studying the spread of hepatitis-C in the U.S. She is interested in detecting any underlying relation of the emergence of hepatitis-C in U.S. with the weather pattern. She has access to a large hepatitis-C database at the Center for disease control (CDC) and an environmental database at EPA. However, they are at two different places and analyzing the data from both of them using a conventional data mining software will require combining the databases at a single location, which is quite impractical.
2. Example II: Two major financial organizations want to cooperate for preventing fraudulent intrusion into their computing system. They need to share data patterns relevant to fraudulent intrusion. However, they do not want to share the data since it is sensitive. Therefore, combining the databases is not feasible. Existing data mining systems cannot handle this situation.
3. Example III: A defense organization is monitoring a situation. Several sensor systems are monitoring the situation and collecting data. Fast analysis of incoming data and quick response is imperative. Collecting all the data to a central location and analyzing it there consumes time and this approach is not scalable for state-of-the-art systems with a large number of sensors.
4. Example IV: A drug manufacturing company is studying the risk factors of breast cancer. It has a mammogram image database and several databases containing patient tissue analysis results, food habits, age, and other particulars. The company wants to find out if there is any correlation between the breast cancer markers in the mammogram images with the tissue features or the age or the food habits.
5. Example V: A major multi-national corporation wants to analyze the customer transaction records for developing a successful business strategy quickly. It has thousands of establishments throughout the world and collecting all the data to a centralized data warehouse, followed by analysis using existing commercial data mining software, takes about a month of the time of the data warehouse team.
SUMMARY OF THE INVENTION
DDM offers an alternate approach to the analysis of distributed data that requires minimal data communication. Typically DDM algorithms involve local data analysis and generation of a global data model by combining the results of the local analysis. Unfortunately, naive approaches to local analysis may be ambiguous and incorrect, producing an incorrect global model. Particularly in the general case, where different sites observe different sets of features, this problem becomes very critical. Therefore developing a well-grounded methodology to address this general case is important. This paper offers a viable approach to the analysis of distributed, heterogeneous databases with distinct feature spaces using the so-called collective data mining (CDM) technology.
Section 2 describes the DDM problem considered here and some of the problems of naïve data analysis algorithms in a DDM environment. In Section 3, the foundation of CDM is presented followed by a discussion on construction of orthonormal representation from incomplete domains and the relation of such representation to mean square error. Sections 4 and 5 present the development of CDM versions of two popular data analysis techniques, decision tree learning and regression. Section 6 presents an overview of a CDM based experimental system called BODHI, that is currently under development. Section 7 summarizes the CDM work presented here, including the BODHI system, and discusses future research directions.

REFERENCES:
patent: 5321612 (1994-06-01), Stewart
patent: 5692029 (1997-11-01), Husseiny et al.
patent: 5911872 (1999-06-01), Lewis et al.
patent: 5970482 (1999-10-01), Pham et al.
patent: 6132381 (2000-10-01), Forbes et al.
patent: 6523026 (2003-02-01), Gillis

Affiliated with

Hershberger Daryl E.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Johnson Erik L.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kargupta Hillol

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Park Byung-Hoon

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Christensen O'Connor Johnson & Kindness PLLC

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Pannala S R

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Robinson Greta

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Collective data mining from distributed, vertically... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Collective data mining from distributed, vertically..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Collective data mining from distributed, vertically... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3284614

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure