Data processing: artificial intelligence – Neural network – Learning task
Reexamination Certificate
2002-07-18
2003-12-16
Davis, George B. (Department: 2121)
Data processing: artificial intelligence
Neural network
Learning task
C706S016000, C706S039000
Reexamination Certificate
active
06665651
ABSTRACT:
BACKGROUND OF THE INVENTION
In general, the present invention relates to techniques for training neural networks employed in control systems for improved controller performance. More-particularly, the invention relates to a new feedback control system and associated method employing reinforcement learning with robust constraints for on-line training of at least one feedback controller connected in parallel with a novel reinforcement learning agent (sometimes referred to, herein, as “RL agent”). Unlike any prior attempt to apply reinforcement learning techniques to on-line control problems, the invention utilizes robust constraints along with reinforcement learning components, allowing for on-line training thereof, to augment the output of a feedback controller in operation-allowing for continual improved operation-moving toward optimal performance while effectively avoiding system instability. The system of the invention carries out at least one sequence of a stability phase followed by a learning phase. The stability phase includes the determination of a multi-dimensional boundary of values, or stability range, for which learning can take place while maintaining system stability. The learning phase comprises the generating a plurality of updated weight values in connection with the on-line training; if and until one of the updated weight values reaches the boundary, a next sequence is carried out comprising determining a next multi-dimensional boundary of values followed by a next learning phase. A multitude of sequences may take place during on-line training, each sequence marked by the calculation of a new boundary of values within which RL agent training, by way of an updating of neural network parameter values, is permitted to take place.
Use of conventional reinforcement learning alone (whether comprising a neural network), to optimize performance of a controller nearly guarantees system instability at some point, dictating that off-line training of sufficient duration must be done, initially, with either simulated or real data sets. Furthermore, while the use of robust control theory, without more, provides a very high level of confidence in system stability, this level of stability is gained at a cost: System control is much less aggressive. Such conservative operation of a feedback control system will rarely reach optimal performance.
Two key research trends led to the early development of reinforcement learning (RL): trial and error learning from psychology disciplines and traditional “dynamic programming” methods from mathematics. RL began as a means for approximating the latter. Conventional RL networks interact with an environment by observing states, s, and selecting actions, a. After each moment of interaction (observing s and choosing an a), the network receives a feedback signal, or reinforcement signal, R, from the environment. This is much like the trial-and-error approach from animal learning and psychology. The goal of reinforcement learning is to devise a control algorithm, often referred to as a policy, that selects optimal actions for each observed state. Here according to the instant invention, optimal actions includes those which produce the highest reinforcements not only for the immediate action, but also for future states and actions not yet selected: the goal being improved overall performance. It is important to note that reinforcement learning is not limited to neural networks; the function and goal(s) of RL can be carried out by any function approximator, such as a polynomial, or a table may be used rather than a neural network, and so on.
In earlier work of the applicants, Anderson, C. W., et al, “
Synthesis of Reinforcement Learning, Neural Networks, and PI Control Applied to a Simulated Heating Coil
.” Journal of Artificial Intelligence in Engineering, Vol. 11, #4 pp. 423-431 (1997) and Anderson, C. W., et al, “
Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil
.” Solving Engineering Problems with Neural Networks: proceedings of the International Conference on Engineering Applicatoins of Neural Networks (EANN-96), ed. By Bulsari, A. B. et al. Systems Engineering Association, Turku, Finland, pp. 135-142 (1996), experimentation was performed on the system as configured in
FIG. 8
of the latter (1997) of the above two references. In this prior work, applicants trained the reinforcement learning agent off-line for many repetitions, called trials, of a selected number of time-step interactions between a simulated heating coil and the combination of a reinforcement learning tool and the PI controller, to gather data set(s) for augmenting (by direct addition, at C) the output of the PI Controller during periods of actual use to control the heating coil. In this 1997 prior work, applicants define and applied a simple Q-learning type algorithm to implement the reinforcement learning.
In their pursuit to continue to analyze and characterize on-line training of a neural network connected to a feedback controller, it was not until later that the applicants identified and applied the unique technique of the instant invention employing a two phase technique, thus allowing for successful on-the-fly, real-time, training of a reinforcement learning agent in connection with a feedback controller, while ensuring stability of the system during the period of training. Conventionally, reinforcement learning had been applied to find solutions to control problems by learning good approximations to the optimal value function, J*, given by the solution to the Bellman optimality equation which can take the form identified as Eqn. (1) in Singh, S., et al, “
Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
.” (undated). And as mentioned earlier, when conventional RL is placed within a feedback control framework, it must be trained off-line in a manner that exposes the system to a wide variety of commands and disturbance signals, in order to become ‘experienced’. This takes a great deal of time and extra expense.
The conventional techniques used to train neural networks off-line can become quite costly: Not only are resources spent in connection with off-line training time, but additional resources are spent when employing feedback controllers operating under conservative, less-aggressive control parameters. For instance, US Pat. No. 5,448,681 issued Sep. 5, 1995 to E. E. R. Khan, refers to what it identifies as a conventional reinforcement learning based system shown in Khan's
FIG. 1. A
closer look at Khan '681 reveals that no suggestion of stability is made. Khan does not attempt to control an interconnected controller on-line with its reinforcement learning subsystem (FIG.
1
). Further, Khan simply doesn't recognize or suggest any need for a stability analysis. Here, the conventional Khan system has to learn everything from scratch, off-line.
While there have been other earlier attempts at applying conventional notions of reinforcement learning to particular control problems, until applicants devised the instant invention, the stability of a feedback control system into which conventional reinforcement learning was incorporated for on-line learning, simply could not be guaranteed. But rather, one could expect that this type of conventional feedback control system, training itself on-the-fly, will pass through a state of instability in moving toward optimal system performance (see
FIG. 4
hereof, particularly the path of weight trajectory
44
without application of constraints according to the invention). While academic study of conventional systems is interesting to note, in practice, these systems are not so interesting to an operators: It will crash before reaching an optimal state. Whereas, a control system employing the robust constraints of the two phased technique of the instant invention, will not—as one will better appreciate by tracing the lower weight trajectory
46
plotted in
FIG. 4
, representing that of a system operating according to the instant invention.
SUMMA
Anderson Charles
Hittle Douglas C.
Kretchmar Matthew
Young Peter M.
Colorado State University Research Foundation
Davis George B.
Macheledt Bales LLP
LandOfFree
Control system and technique employing reinforcement... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Control system and technique employing reinforcement..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Control system and technique employing reinforcement... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3109991