Method and system for training an artificial neural network

Data processing: artificial intelligence – Neural network

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method and system for training an artificial neural network Method and system for training an artificial neural network

: 1999-03-31
: 2001-07-31
: Powell, Mark R. (Department: 2122)
: Data processing: artificial intelligence
: Neural network

: C706S012000, C706S014000
: Reexamination Certificate
: active
: 06269351
: ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
The present invention relates generally to neural network training systems and their operation. More particularly, the present invention relates to an improved neural network training method and system having the ability to change its learning rate in response to training performance, to automatically select a representative training dataset, to reinitialize the neural network to achieve a preset error goal, and to automatically optimize the neural network size for a given training dataset.
BACKGROUND OF THE INVENTION
Artificial neural networks (“ANNs”) are well known in the prior art. The role of an ANN is to perform a non-parametric, nonlinear, multi-variate mapping from one set of variables to another. ANN
10
of
FIG. 1
illustrates such a mapping by operating on input vector
12
to produce output vector
14
. To perform this mapping, a training algorithm is applied to deduce the input/output relationship(s) from example data. Such ANN training algorithms are also well known in the prior art.
Prior to training, an ANN is initialized by randomly assigning values to free parameters known as weights. The training algorithm takes an unorganized ANN and a set of training input and output vectors and, through an iterative process, adjusts the values of the weights. Ideally, by the end of the training process, presentation of a vector of inputs from the training data to the ANN results in activations (outputs)at the output layer that exactly match the proper training data outputs.
The basic unit that makes up an artificial neural network is variously known as an artificial neuron, neurode, or simply a node. As depicted in
FIG. 2
, each ANN node
16
has a number of variable inputs
15
, a constant unity input
19
(also known as a bias or a bias input), and an output
17
. The variable inputs correspond to outputs of previous nodes in the ANN. Each input to a node, including the bias, is multiplied by a weight associated with that particular input of that particular node. All of the weighted inputs are summed. The summed value is provided to a nonlinear univariate function known as a transfer or squashing function. The purpose of a squashing function is two-fold: to limit (threshold) the magnitude of the activation (output) achievable by each node, and to introduce a source of non-linearity into the ANN. The most commonly applied transfer functions for continuous mappings include the hyperbolic tangent function or the sigmoid function, which is given below:
φ
⁡
(
x
)
=
1
1
+
ⅇ
-
ax
.
Equation
⁢

⁢
(
1
)
As expressed in Equation 2 below, x
j
k
, the output of node number j belonging to a layer k, is simply the transfer function &phgr; evaluated at the sum of the weighted inputs.
x
j
k
=
φ
⁡
(
∑
i
=
0
n
⁢
w
ij
⁢
x
i
k
-
1
)
.
Equation
⁢

⁢
(
2
)
In equation (2) X
i
k−1
is the activation of node number i in the previous layer, w
ij
represents the weight between node j in the k-th layer and node i in the previous layer, and &phgr; represents the transfer function.
A basic feedforward artificial neural network incorporates a number of nodes organized into layers. Most feedforward ANNs contain three or more such layers. An example ANN is illustrated in FIG.
3
. ANN
10
consists of an input layer
18
communicatively connected to one or more hidden layers
20
. Hidden layers
20
can be communicatedly connected to one another, to input layer
18
and to output layer
22
. All layers are comprised of one or more nodes
16
. Information flows from left to right, from each layer to the next adjacent layer.
The nodes in the input layer are assigned activation values corresponding to the input variables. The activation of each of these nodes is supplied as a weighted input to the next layer. In networks involving three or more layers, the interior layers are known as hidden layers. After one or more hidden layers, the final layer, known as the output layer, is reached. The activations of the nodes of the output layer correspond to the output variables of the mapping.
The interior layers of the network are known as hidden layers to distinguish them from the input and output layers whose activations have an easily interpretable relationship with something meaningful. The hidden layers perform an internal feature detection role, and thus harbor an internal representation of the relationships between the inputs and outputs of the mapping, but are usually not of use to and are generally “hidden” from the attention of the user.
As previously mentioned, transfer functions help the mapping by thresholding the activations of nodes. This is desirable as it forces the ANN to form distributed relationships and does not allow one or a few nodes to achieve very large activations with any particular input/output patterns. These requirements and restrictions upon the behavior of the ANN help to ensure proper generalization and stability during training and render the ANN more noise-tolerant. However, a consideration raised by transfer functions is that they generally cause ANN outputs to be limited to the ranges [0,1] or [1,1]. This necessitates a transformation to and from the ranges of the output variables and the transfer function. In practice, a network is trained with example inputs and outputs linearly scaled to the appropriate range—just within the tails of the transfer function. When the network is deployed, the inputs are again scaled, but the outputs of the network are usually “descaled” by applying the inverse of the scaling function. The de-scaling provides real-world units and values to the otherwise unit-less fractional values generated by the ANN.
When a network is generated or initialized, the weights are randomly set to values near zero. At the start of the ANN training process, as would be expected, the untrained ANN does not perform the desired mapping very well. A training algorithm incorporating some optimization technique must be applied to change the weights to provide an accurate mapping. The training is done in an iterative manner as prescribed by the training algorithm. The optimization techniques fall into one of two categories: stochastic or deterministic.
Stochastic techniques include simulated annealing and genetic-algorithms and generally avoid all learning instabilities and slowly locate a near global optimum (actually a minimum in the error surface) for the weights. Deterministic methods, such as gradient descent, very quickly find a minimum but are susceptible to local minima. Whichever category of optimization is applied, sufficient data representative of the mapping to be performed must be selected and supplied to the training algorithm.
Training data selection is generally a nontrivial task. An ANN is only as representative of the functional mapping as the data used to train it. Any features or characteristics of the mapping not included (or hinted at) within the training data will not be represented in the ANN. Selection of a good representative sample requires analysis of historical data and trial and error. A sufficient number of points must be selected from each area in the data representing or revealing new or different behavior of the mapping. This selection is generally accomplished with some form of stratified random sampling, i.e., randomly selecting a certain number of points from each region of interest.
Most training algorithms for feedforward networks incorporate one form or another of a gradient descent technique and collectively are known as back-propagation training. The term back-propagation describes the manner in which the error gradient calculation propagates through the ANN. The expression for the prediction error &dgr;
j
at some node J in the output layer is simply the difference between the ANN output and the training data output.
&dgr;
j
output
=X
j
desired
−X
j
output
Equation (4).
The expression for the error at some node i in a previous (to the output) layer may be expressed in terms of the errors at

Affiliated with

Black Christopher Lee

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Akin Gump Strauss Hauer & Feld L.L.P.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Dryken Technologies, Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Khatri Anil

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Powell Mark R.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Rourk Christopher J.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for training an artificial neural network does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for training an artificial neural network, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for training an artificial neural network will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2556232

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure