Data processing: artificial intelligence – Machine learning
Reexamination Certificate
1998-04-06
2001-12-04
Powell, Mark R. (Department: 2122)
Data processing: artificial intelligence
Machine learning
C706S014000, C706S020000, C706S062000
Reexamination Certificate
active
06327581
ABSTRACT:
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever.
§1. BACKGROUND OF THE INVENTION
§1.1 Field of the Invention
The present invention concerns classifying objects, and more specifically, classifying objects using support vector machines. In particular, the present invention concerns methods and apparatus for building (or training) such support vector machines.
§1.2 Related Art
§1.2.1 THE NEED FOR CLASSIFICATION/RECOGNITION
To increase their utility and intelligence, machines, such as computers for example, are requested to classify (or recognize) objects to an ever increasing extent. For example, computers may use optical character recognition to classify handwritten or scanned numbers and letters. Computers may use pattern recognition to classify an image, such as a face, a fingerprint, a fighter plane, etc. Computers may also use text classification techniques to organize documents or textual information into a hierarchy of predetermined classes for example.
§1.2.2 KNOWN CLASSIFICATION/RECOGNITION METHODS
In this section, some known classification and/or recognition methods are introduced. Further, acknowledged or suspected limitations of these classification and/or recognition methods are introduced. First, rule-based classification and/or recognition is discussed in §1.2.2.1 below. Then, classification and/or recognition systems which use both learning elements and performance elements in are discussed in §1.2.2.2.
§1.2.2.1 RULE BASED CLASSIFICATION AND RECOGNITION
In some instances, objects must be classified with absolute certainty, based on certain accepted logic. A rule-based system may be used to effect such types of classification. Basically, rule-based systems use production rules of the form:
IF condition, THEN fact.
For example, a plant may be classified as an oak tree if certain conditions or rules are met. However, in many instances, rule-based systems become unwieldy, particularly in instances where the number of measured or input values for features or characteristics) becomes large, logic for combining conditions or rules becomes complex, and/or the number of possible classes becomes large. Since humans create the rules and logic, as the complexity of rule-based classifiers increases, the likelihood of a human error being introduced increases dramatically. Thus, over the last decade or so, other types of classifiers have been used increasingly. Although these classifiers do not use static, predefined logic, as do rule-based classifiers, they have outperformed rule-based classifiers in many applications. Such classifiers are discussed in §1.2.2.2 below and typically include a learning element and a performance element. Such classifiers may include neural networks, Bayesian networks, and support vector machines. Although each of these classifiers in known, each is briefly introduced below for the reader's convenience.
§1.2.2.2 CLASSIFIERS HAVING LEARNING AND PERFORMANCE ELEMENTS
As just mentioned at the end of the previous section, classifiers having learning and performance elements outperform rule-based classifiers in many applications. To reiterate, these classifiers may include neural networks (known, but introduced in §1.2.2.2.1 below for the reader's convenience), Bayesian networks (known, but introduced in §1.2.2.2.2 below for the reader's convenience), and support vector machines (discussed in §1.2.2.2.3 below).
§1.2.2.2.1 NEURAL NETWORKS
A neural network is basically a multilayered, hierarchical arrangement of identical processing elements, also referred to as neurons. Each neuron can have one or more inputs but only one input. Each neuron input is weighted by a coefficient. The output of a neuron is typically a function of the sum of its weighted inputs and a bias value. This function, also referred to as an activation function, is typically a sigmoid function. That is, the activation function may be S-shaped, monotonically increasing and asymptotically approaching fixed values (i.e., +1, 0, −1) as its input(s) respectively approaches positive or negative infinity. The sigmoid function and the individual neural weight and bias values determine the response or “excitability” of the neuron to input signals.
In the hierarchical arrangement of neurons, the output of a neuron in one layer may be distributed as an input to one or more neurons in a next layer. A typical neural network may include an input layer and two (2) distinct layers; namely, an input layer, an intermediate neuron layer, and an output neuron layer. Note that the nodes of the input layer are not neurons. Rather, the nodes of the input layer have only one input and basically provide the input, unprocessed, to the inputs of the next layer. If, for example, the neural network were to be used for recognizing a numerical digit character in a 20 by 15 pixel array, the input layer could have 300 neurons (i.e., one for each pixel of the input) and the output array could have 10 neurons (i.e., one for each of the ten digits).
The use of neural networks generally involves two (2) successive steps. First, the neural networks is initialized and trained on known inputs having known output values (or classifications). Once the neural network is trained, it can then be used to classify unknown inputs. The neural network may be initialized by setting the weights and biases of the neurons to random values, typically generated from a Gaussian distribution. The neural network is then trained using a succession of inputs having known outputs (or classes). As the training inputs are fed to the neural network, the values of the neural weights and biases are adjusted (e.g., in accordance with the known back-propagation technique) such that the output of the neural network of each individual training pattern approaches or matches the known output. Basically, a gradient descent in weight space is used to minimize the output error. In this way, learning using successive training inputs converges towards a locally optimal solution for the weights and biases. That is, the weights and biases are adjusted to minimize an error.
In practice, the system is not trained to the point where it converges to an optimal solution. Otherwise, the system would be “over trained” such that it would be too specialized to the training data and might not be good at classifying inputs which differ, in some way, from those in the training set. Thus, at various times during its training, the system is tested on a set of validation data. Training is halted when the system's performance on the validation set no longer improves.
Once training is complete, the neural network can be used to classify unknown inputs in accordance with the weights and biases determined during training. If the neural network can classify the unknown input with confidence, one of the outputs of the neurons in the output layer will be much higher than the others.
To ensure that the weight and bias terms do not diverge, the algorithm uses small steps. Consequently, convergence is slow. Also, the number of neurons in the hidden layer cannot easily be determined a priori. Consequently, multiple time-consuming experiments are often run to determine the optimal number of hidden neurons.
§1.2.2.2.2 BAYESIAN NETWORKS
Having introducing neural networks above, Bayesian networks are now briefly introduced. Basically, Bayesian networks use hypotheses as intermediaries between data (e.g., input feature vectors) and predictions (e.g., classifications). The probability of each hypothesis, given the data (“P(hypo|data)”), may be estimated. A prediction is made from the hypotheses using posterior probabilities of the hypotheses to weight the individual predictions of each of the hypotheses. The probability of a prediction X
Microsoft Corporation
Pokotylo John C.
Powell Mark R.
Rossi Jeffrey Allen
Straub & Pokotylo
LandOfFree
Methods and apparatus for building a support vector machine... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for building a support vector machine..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for building a support vector machine... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2590795