Data processing: artificial intelligence – Neural network – Learning task
Reexamination Certificate
1998-09-23
2001-10-30
Davis, George B. (Department: 2122)
Data processing: artificial intelligence
Neural network
Learning task
C706S016000, C706S025000
Reexamination Certificate
active
06311172
ABSTRACT:
BACKGROUND OF THE INVENTION
The removal from a neural network of weights that have only a small information content in terms of training data to be approximated considerably improves the generalization characteristic of the neural network whose dimensionality has been reduced. Furthermore, a smaller number of training data items is required to train the reduced neural network. The rate of learning and the rate of classification in the test phase are also increased.
The removal of weights from a neural network is called pruning.
Various pruning methods are known. For example, in a first prior art A. Zell, Simulation Neuronaler Netze (Simulation of neural networks), (1994), Addison-Wesley, 1st Edition, ISBN 3-89319-554-8, pp. 319-328 discloses a so-called optimal brain damage (OBD) method. In this method, the second derivatives of the error function are used on the basis of the individual weights in the neural network, in order to select those weights which should be removed. This method has the disadvantage that it operates only subject to the precondition that the training phase has converged, that is to say that the error function, which is minimized during the training phase, has reached a local minimum or a global minimum of the error function. In this case, the disadvantage of this known method is primarily that, using this method, one may in general investigate only considerably overtrained neural networks for weights to be removed.
The method likewise described in the first prior art document is subject to the same precondition of convergence in the training phase, and thus to the same disadvantages as well. This method is called the optimal brain surgeon (OBS).
Furthermore, a method is known in which the training phase is stopped before a minimum is reached in the error function. This procedure is called early stopping and is described, for example, in a second prior art W. Finnoff et al., Improving Model Selection by Nonconvergent Methods, Neural Networks, Vol. 6, (1993) pp. 771 to 783. Although the OBD method is also proposed there for assessing weights that are suitable for removal, this is only for the situation where the error function is at a minimum (page 775, penultimate paragraph).
Pruning methods which use an assessment variable that is used to describe the extent to which the value of the error function varies when a weight (w
i
) is removed from the neural network are disclosed in third and fourth prior art documents, R. Reed, Pruning Algorithms—A Survey, In: IEEE Transactions on Neural Networks, Vol. 4, No. 5, September 1993, pp. 740-747; and E. D. Kamin, A Simple Procedure for Pruning Back-Propagation Trained Neural Networks, In: IEEE Transactions on Neural Networks, Vol. 1, No. 2, June 1990, pp. 239-242.
SUMMARY OF THE INVENTION
The invention is based on the problem of using a computer to determine weights that are suitable for removal from a neural network.
In general terms the present invention is a method for determining weights that are suitable for removal in a neural network, using a computer.
The training phase of the neural network is stopped before an error function, which is to be minimized in the training phase, reaches a minimum.
A first variable is defined for at least one weight of the neural network. The first variable is used to describe an assessment of the at least one weight in terms of removal of the at least one weight from the neural network, on the assumption that the error function is at the minimum.
A second variable is defined for the weight. The second variable is used to describe the extent to which the value of the error function varies when the weight varies.
A criterion variable for the weight is determined from at least the first variable and the second variable. The criterion variable is used to describe the extent to which the value of the error function varies if the weight is removed from the neural network.
The weight is classified, as a weight that is suitable for removal, if the criterion variable results in the weight being such that the removal of the weight varies the value of the error function by a value which is less than a first limit that can be predetermined.
Based on a first variable, which is determined using either the known method of optimal brain damage (OBD) or that of the optimal brain surgeon (OBS), a second variable is determined for each investigated weight, which second variable is used to describe how the error function would vary if this weight were varied. The second variable can thus be regarded as a correction term that is used to correct the termination of the training phase before the error function reaches a minimum. The first variable and second variable are now used to form a criterion variable which is respectively used to determine, for the weight, whether this weight is or is not suitable for removal from the neural network.
The formation of the second variable in the manner described above considerably improves the criterion which is used as the basis to decide whether a weight is suitable for removal from the neural network. In fact, this results in those weights being classified as being suitable for removal which also have the smallest information content in terms of the training data, and which can thus be removed without significant required information being lost. This results in the training phase being speeded up considerably, a considerable improvement in the generalization characteristic in the test phase of the neural network, and considerably faster classification in the test phase, without any major information loss.
A further advantage of the method according to the invention is that it may be used in conjunction with early stopping. This is not possible for the optimal brain damage (OBD) and optimal brain surgeon (OBS) methods. The method according to the invention thus makes it possible to combine with one another two advantageous methods for reducing the degrees of freedom in a neural network.
One development of the method makes it possible to reintroduce weights once they have been removed if it is found as the training continues further that the information content of the removed weight is greater than that of weights that have not yet been removed. This capability to reintroduce into the neural network weights that have already been removed considerably improves the flexibility of the method and thus means that removals which turn out to be unfavorable after a certain time can also be reversed again. These characteristics lead to the neural network that is finally formed having a considerably more optimal structure than was possible using known methods.
REFERENCES:
patent: 5559929 (1996-09-01), Wasserman
patent: 5636326 (1997-06-01), Stork et al.
patent: 5819226 (1998-10-01), Gopinathan et al.
Stalin et al, “Vectorized Backpropagation and Automatic Pruning for MLP Network Optimization,” IEEE ICONN Mar.-Apr. 1993.*
Hu et al, “Structural Simplification of a Feed-Forward Multilayer Perception Artificial Neural Network,” IEEE ICASSP Apr. 1991.*
Ledoux et al, “Two Original Weight Pruning Methods Based on Statistical Tests and Rounding Techniques”, IEE Proceedings of Vision, Image and Signal Processing Aug. 1994.*
Proceeding of the International Joint Conference on Neural Network, Baltimore, Jun. 7-11, 1992, vol. 3, Institute of Electrical & Electronics Engineers, XP000340469, F. Hergert et al, “A Comparsion of Weight Elimination Methods for Reducing Complexity in Neural Networks”, pp. 980-987.
Proceedings of the Third International Conference on Neural Networks in the Capital Markets, Proceedings of the 3rdInternational Conference on Neural Networks in Financial Engineering, London, Oct. 1996, XP000675785, A.S. Weigend et al, “Clearning Cleaning and Learning of Data Structures”, pp. 511-522.
Proceedings of the International Conference on Neural Networks, San Francisco, Mar. 28-Apr. 1, 1993, XP000366793, B. Hassibi et al, Optimal Brain Surgeon and General Network Pruning, pp. 293-299.
Systems & Computers in Japan, vol. 23, No. 8, Jan. 1, 1992; XP000329647, M. Hagiwar
Neuneier Ralph
Tresp Volker
Zimmermann Hans-Georg
Davis George B.
Schiff & Hardin & Waite
Siemens Aktiengesellschaft
LandOfFree
Method for determination of weights, suitable for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for determination of weights, suitable for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for determination of weights, suitable for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2555193