Method and apparatus to model the variables of a data set

Data processing: artificial intelligence – Machine learning – Genetic algorithm and genetic programming system

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S016000, C706S045000, C706S046000, C706S059000

Reexamination Certificate

active

06480832

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a method and apparatus for modeling the variables of a data set by means of a probabilistic network including data nodes and causal links.
Probabilistic networks are graphical models of cause and effect relationships between variables in a data set. Such networks are referred to in the literature as Bayesian networks, belief networks, causal networks and knowledge maps. In this specification, the term probabilistic networks will be used generically to refer to all such networks and maps. The graphical model in each case includes data nodes, to represent the variables, and causal links or arcs, to refer to the dependencies connecting between the data nodes. A given set of nodes and arcs defines a network structure.
Once a network structure has been found that accurately models a set of data, the model summarizes knowledge about possible causal relationships between the variables in the data set. Such a model allows knowledge about relationships between variables in a large set of data to be reduced to a concise and comprehensible form and is the primary goal of data mining.
One of the difficulties with modeling a set of data using a probabilistic network is to find the most likely network structure to fit a given input data set. This is because the search space of possible network structures increases exponentially with the number of data nodes in the network structure. An exhaustive evaluation of all the possible networks to measure how well they fit the input data set has been regarded as impractical even when limited to modest sized network structures.
The present invention has the aim of more efficiently generating a representation of a probabilistic network which models the variables of an input data set.
SUMMARY OF THE INVENTION
According to the present invention, there is now provided a method of modeling the variables in an input data set by means of a probabilistic network including data nodes and causal links, the method comprising the steps of;
registering the input data set,
generating a population of genomes each individually modeling the input data set by means of chromosome data to represent the data nodes in a probabilistic network and the causal links between the data nodes,
performing a crossover operation between the chromosome data of parent genomes in the population to generate offspring genomes,
performing an addition operation to add the offspring genomes to the said population,
performing a scoring operation on genomes in the said population to derive scores representing the correspondence between the genomes and the input data set,
performing a selecting operation to select genomes from the population according to the scores,
repeating the crossover, scoring, addition and selecting operations for a plurality of generations of the genomes,
and selecting, as an output model, a genome from the last generation.
Further according to the present invention there is provided apparatus for modeling the variables in an input data set by means of a probabilistic network including data nodes and causal links, the apparatus comprising;
data register means to register the input data set,
generating means for generating a population of genomes each individually modeling the input data set by means of chromosome data to represent the data nodes in a probabilistic network and the causal links between the data nodes,
crossover means for performing a crossover operation between the chromosome data of parent genomes in the population to generate offspring genomes,
adding means to perform an addition operation to add the offspring genomes to the said population,
scoring means for performing a scoring operation on genomes in the said population to derive scores representing the correspondence between the genomes and the input data set,
selecting means for performing a selecting operation to select genomes from the population according to the scores,
control means to control the crossover, scoring, addition and selecting means to repeat their operations for a plurality of generations of the genomes,
and output means to select, as an output model, a genome from the last generation.


REFERENCES:
patent: 5214746 (1993-05-01), Fogel et al.
patent: 5245696 (1993-09-01), Stork et al.
patent: 6088690 (2000-07-01), Gounares et al.
patent: 6128607 (2000-10-01), Nordin et al.
patent: 6154736 (2000-11-01), Chickering et al.
patent: 6157921 (2000-12-01), Barnhill
patent: 6192354 (2001-02-01), Bigus et al.
patent: 9011568 (1990-10-01), None
Proc. 6thConference on Artificial Intelligence in Medicine Europe, AIME 97, pp 261-272, Springer-Verlag 1997, Larranaga P et al., “Learning Bayesian networks by genetic algorithms: a case study in the prediction of survival in malignant skin melanoma”.
IEEE Transactions on Systems, Man and Cybernetics, Part A (Systems & Humans), v26, n4, pp 487-493, Jul. 1996, Larranaga P et al., “Learning Bayesian network structures by searching for the best ordering with genetic algorithms”.
Larranaga, P.; Poza, M.; Yurramendi, Y.; Murga, R.H.; Kuijpers, C.M.H., Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.: 1, Sep. 1996.*
Takara, T.; Higa, K.; Nagayama, I., Isolated word recognition using the HMM structure selected by the genetic algorithm, Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol.: 2, Apr. 21-24, 1997, Page(s).*
Chau, C.W.; Kwong, S.; Diu, C.K.; Fahrner, W.R., Optimization of HMM by a genetic algorithm, Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol.: 3, Apr. 21-24, 1997, pp.: 1727-1730 vol. 3.*
Christoph F. Eick and Daw Jong; Learning Bayesian classification rules through genetic algorithms; Proceedings of the second international conference on Information and knowledge management, Nov. 1-5, 1993, pp. 305-313.*
Fang Sun; Guangrui Hu, Speech recognition based on genetic algorithm for training HMM, Electronics Letters vol.: 34 16, Aug. 6, 1998, pp.: 1563-1564.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus to model the variables of a data set does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus to model the variables of a data set, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus to model the variables of a data set will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2943851

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.