Patent
1994-08-01
1997-03-04
Hafiz, Tariq R.
395 20, 395 21, G06E 100, G06E 300, G06F 1518
Patent
active
056088434
ABSTRACT:
A new algorithm for reinforcement learning, advantage updating, is proposed. Advantage updating is a direct learning technique; it does not require a model to be given or learned. It is incremental, requiring only a constant amount of calculation per time step, independent of the number of possible actions, possible outcomes from a given action, or number of states. Analysis and simulation indicate that advantage updating is applicable to reinforcement learning systems working in continuous time (or discrete time with small time steps) for which Q-learning is not applicable. Simulation results are presented indicating that for a simple linear quadratic regulator (LQR) problem with no noise and large time steps, advantage updating learns slightly faster than Q-learning. When there is noise or small time steps, advantage updating learns more quickly than Q-learning by a factor of more than 100,000. Convergence properties and implementation issues are discussed. New convergence results are presented for R-learning and algorithms based upon change in value. It is proved that the learning rule for advantage updating converges to the optimal policy with probability one.
REFERENCES:
patent: 5250886 (1993-10-01), Yasuhara et al.
patent: 5257343 (1993-10-01), Kyuma et al.
Leemon C. Baird III, WL-TR-93-1146 Advantage Updating, Nov. 1993, pp. 1-41 Avionic Directorate, Wright Laboratory AFMC, WPAFB.
Yen, "Hybrid learning control in flexible space structures with reconfiguration capability"; Proceedings of the 1994 IEEE International symposium on intelligent control, pp. 321-326, 16-18 Aug. 1994.
Baird, "Function minimization for dynamic programming using connectionist networks"; 1992 IEEE International conference on systems, man and cybernetics, pp. 19-24 vol. 1, 18-21 Oct. 1992.
Baird, "Reinforcement leaving in continuous time: advantage updating"; 1994 IEEE International conference on neural networks, pp. 2448-2453 vol. 4, 27 Jun. -2 Jul. 1994.
Franz Bernard E.
Hafiz Tariq R.
Hollins Gerald B.
Kundert Thomas L.
The United States of America as represented by the Secretary of
LandOfFree
Learning controller with advantage updating algorithm does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Learning controller with advantage updating algorithm, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning controller with advantage updating algorithm will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2153810