System and method for quantifying an extent to which a data...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06684208

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of data mining, and more particularly to a system and method for estimating the point of diminishing returns for additional information in data mining processing applications.
2. Description of the Related Art
In data mining (DM) applications, most tools present performance results, but do not indicate how close the performance results are to the ultimate, theoretical upper bounds of performance given a set of possible features. This lack of indication prompts many users to continue their quest for better performance.
In P. Caraca-Valente and J. L. Morant, “Knowledge-based systems' validation when to stop running test cases,”
Int. J. Human
-
Computer Studies
, Vol. 51, pp. 757-781, 1999, the authors explore the issue of when to stop running test cases in knowledge engineering. Their focus is on determining the optimal number of data partitions (test cases) to evaluate the knowledge-based system (KBS) with a specified confidence level. That is, instead of attempting to assess the optimality of a DM algorithm, they derive a theoretical foundation on how many independent tests should be run regardless of a DM algorithm selected.
In F. Shin and D. Kil, “Classification Cramer-Rao Bounds on Stock Price Prediction,”
Journal of Forecasting
, Vol. 17, No. 5-6, pp. 389-399, 1998, Shin and Kil discuss this idea in the context of stock-price prediction. However, their approach fails to consider the following important aspects in establishing the level of confidence associated with quantification of the level of useful information being captured by a DM algorithm:
1. No explicit mechanism for finding the inflection point in rank-order curves, which display DM performance as a function of input feature dimension.
2. The use of a single area-of-overlap threshold makes the algorithm sensitive to the value of the threshold.
3. In certain situations, the direct computation of multi-dimensional (MD) class-conditional feature probability density functions (PDFs) is advantageous to using marginal PDFs.
Gaborski et al. (U.S. Pat. No. 5,479,523) discuss an approach to reduce the feature dimension in digit/character recognition. Their system is directed more to identifying an optimal feature subset using a combination of correlation and discrimination weights, not to quantifying the level of optimality from the perspective of a DM algorithm output.
Kohavi and John, “Wrappers for feature subset selection,”
Artificial Intelligence
, Vol. 97, Nos. 1-2, pp. 273-324, 1997 also explore a feature-subset selection using a learning algorithm as a wrapper around the feature-subset selection problem, while Liu and Setiono, “A probabilistic approach to feature selection: A filter solution,” In
Proc. Thirteenth Int'l Conf. On Machine Learning
, San Francisco, Calif.: Morgan Kaufmann, pp. 319-327, 1996 discuss a probabilistic approach to feature-subset selection. Dash et al. “Consistency Based Feature Selection,”
Proc.
4
th
Pacific
-
Asia KDD
, Kyoto, Japan, April 2000 also explore the issue of feature-subset selection using an inconsistency measure that calculates how capable each feature is of separating different output classes.
Hilderman and Hamilton “Applying Objective Interestingness Measures in Data Mining Systems,”
Proc. KDD
2000, Lyon, France, September 2000 describe a two-step process to rank the level of “interestingness” in discovered patterns based on the chi-square test and objective measures of interestingness in association. Boussouf, “A Hybrid Approach to Feature Selection,” Proc. KDD'98, Nantes, France, September 1998 discusses a hybrid approach to feature-subset selection based on redundancy and relevancy. Basically, all these approaches focus on feature optimization, not on quantifying the optimality of a backend DM classification algorithm by considering the symbiotic nature between the selected feature subset and the actual DM algorithms employed for the final-stage classification.
On the other hand, some researchers explore the issue of combining (fusing) multiple classification algorithms. In Tsymbal and Puuronen “Bagging and Boosting with Dynamic Integration of Classifiers,”
Proc. KDD
2000, Lyon, France, September 2000, the authors discuss bagging and boosting with dynamic integration instead of simple voting in order to combine outputs of multiple classifiers. Todorovski and Dzeroski “Combining Multiple Models with Meta Decision Trees,”
Proc. KDD
2000, Lyon, France, September 2000 also explore combining multiple models using an algorithm called stacking or meta learners (a decision tree in this case). Their algorithm stacks another classifier at the outputs of multiple first-stage classifiers to learn how each first-stage classifier views input features. The second-stage meta classifier learns any peculiarity associated with how the first-stage classifiers call in order to improve classification performance further.
Ricci and Aha “Error-correcting output codes for local learners,” In
Proc. European Conference on Machine Learning
, Nedellec, C. and C. Rouveird (eds), Berlin, Germany: Springer-Verlag, pp. 280-291, 1998 advocate extending the concept of forward error correction to model combining to improve robustness. Gamberger and Lavrac “Confirmation Rule Sets,”
Proc. KDD
2000, Lyon, France, September 2000 introduce the concept of confirmation rule sets, which are based on two intuitive principles—consensus and deferment of decision if uncertain until more evidence is received (a.k.a. sequential decision-making). Tumer and Ghosh “Error Correlation and Error Reduction in Ensemble Classifiers,”
Connection Science
, Vol. 8, Nos. 3-4, pp. 385-404, 1996 describe ranking the outputs of the first-stage classifiers so that the second-stage classifier (i.e., multiple-model combiner) can be optimized for the selected subset of the first-stage outputs. They advocate combining highly correlated classifier outputs. None of these papers address the central need for assessing the extent to which the selected DM algorithm captures useful information embedded in raw data. Moreover, the lack of such quantification is the motivating force behind the development of such complex, tedious, and somewhat ad hoc techniques for combining the outputs of multiple classifiers.
Chen et al. (U.S. Pat. No. 5,727,199) also explore a two-step approach for feature identification and derivation of classification rules, which are tangentially similar to the U.S. Pat. No. 5,479,523. Again, this patent focuses on identifying the optimal feature subset.
Thus, there is a need for a convenient means of estimating the extent to which a DM algorithm captures useful information in raw feature data, such that a user knows when optimal performance has been obtained.
SUMMARY OF THE INVENTION
In general, the present invention is a system and method for estimating the point of diminishing returns for additional information in data mining processing applications. The present invention provides a convenient method of estimating the extent to which a data mining algorithm captures useful information in raw feature data.
First, the input data is processed using a forward transform. A region of overlap Y in the forward transformed data is identified and quantified. The region of overlap Y is processed with a reverse transform to create an overlap region Z in an original feature space. The degree of overlap in region Z is quantified and compared to a level of overlap in the Y region, such that the comparison quantifies the extent to which a data mining algorithm captures useful information in the input data.
In one embodiment of the present invention, if a dimension N of the feature set is above a threshold value, the input features are orthogonalized with a level of overlap in region Z to convert to N one-dimensional vectors. The level of overlap in region Z is computed as a combination of each probability density function (PDF) for each of the N one-dimensional vectors. This reduces the complexity of the req

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for quantifying an extent to which a data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for quantifying an extent to which a data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for quantifying an extent to which a data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3200604

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.