Learning controller with advantage updating algorithm

Patent

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Learning controller with advantage updating algorithm Learning controller with advantage updating algorithm

: 1994-08-01
: 1997-03-04
: Hafiz, Tariq R.

: 395 20, 395 21, G06E 100, G06E 300, G06F 1518
: Patent
: active
: 056088434
: ABSTRACT:
A new algorithm for reinforcement learning, advantage updating, is proposed. Advantage updating is a direct learning technique; it does not require a model to be given or learned. It is incremental, requiring only a constant amount of calculation per time step, independent of the number of possible actions, possible outcomes from a given action, or number of states. Analysis and simulation indicate that advantage updating is applicable to reinforcement learning systems working in continuous time (or discrete time with small time steps) for which Q-learning is not applicable. Simulation results are presented indicating that for a simple linear quadratic regulator (LQR) problem with no noise and large time steps, advantage updating learns slightly faster than Q-learning. When there is noise or small time steps, advantage updating learns more quickly than Q-learning by a factor of more than 100,000. Convergence properties and implementation issues are discussed. New convergence results are presented for R-learning and algorithms based upon change in value. It is proved that the learning rule for advantage updating converges to the optimal policy with probability one.

REFERENCES:
patent: 5250886 (1993-10-01), Yasuhara et al.
patent: 5257343 (1993-10-01), Kyuma et al.
Leemon C. Baird III, WL-TR-93-1146 Advantage Updating, Nov. 1993, pp. 1-41 Avionic Directorate, Wright Laboratory AFMC, WPAFB.
Yen, "Hybrid learning control in flexible space structures with reconfiguration capability"; Proceedings of the 1994 IEEE International symposium on intelligent control, pp. 321-326, 16-18 Aug. 1994.
Baird, "Function minimization for dynamic programming using connectionist networks"; 1992 IEEE International conference on systems, man and cybernetics, pp. 19-24 vol. 1, 18-21 Oct. 1992.
Baird, "Reinforcement leaving in continuous time: advantage updating"; 1994 IEEE International conference on neural networks, pp. 2448-2453 vol. 4, 27 Jun. -2 Jul. 1994.

Affiliated with

Baird, III Leemon C.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Franz Bernard E.

Representative

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hafiz Tariq R.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hollins Gerald B.

Representative

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kundert Thomas L.

Representative

[ 0.00 ] – not rated yet Voters 0 Comments 0

The United States of America as represented by the Secretary of

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Learning controller with advantage updating algorithm does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Learning controller with advantage updating algorithm, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning controller with advantage updating algorithm will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2153810

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure