Method and apparatus for reward-based learning of improved...

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

08001063

ABSTRACT:
In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves obtaining a decision-making entity and a reward mechanism. The decision-making entity manages a plurality of application environments supported by a data processing system, where each application environment operates on data input to the data processing system. The reward mechanism generates numerical measures of value responsive to actions performed in states of the application environments. The decision-making entity and the reward mechanism are applied to the application environments, and results achieved through this application are processed in accordance with reward-based learning to derive a policy. The reward mechanism and the policy are then applied to the application environments, and the results of this application are processed in accordance with reward-based learning to derive a new policy.

REFERENCES:
patent: 6230200 (2001-05-01), Forecast et al.
patent: 6581048 (2003-06-01), Werbos
patent: 2002/0178103 (2002-11-01), Dan et al.
patent: 2003/0149685 (2003-08-01), Trossman et al.
patent: 2004/0230459 (2004-11-01), Dordick et al.
patent: 2005/0071825 (2005-03-01), Nagaraj et al.
patent: 2005/0141554 (2005-06-01), Hammarlund et al.
patent: 2005/0165925 (2005-07-01), Dan et al.
patent: 2006/0179383 (2006-08-01), Blass et al.
patent: 2006/0224535 (2006-10-01), Chickering et al.
patent: 2007/0006278 (2007-01-01), Ioan Avram et al.
‘Temporal Difference Learning and TD-Gammon’: Tesauro, 1995, IEEE Communications of the ACM, vol. 38, No. 3, pp. 58-68.
‘Elements of Artificial Neural Networks’: Mehrotra, 1997, MIT press.
‘Cooperative Multi-Agent Learning: The state of the art’: Panait, George Mason University, Computer Science, Technical Report-200301; 2003.
Obitko M. Genetic Algorithms, 2003, [retrieved on Jul. 12, 2010]. Retrieved from Internet:<http://labe.felk.cvut.cz/˜obitko/ga/>.
RL-Based Online Resource Allocation in Multi-Workload Computing Systems. Presented by Gerald Tesauro at the AAAI Fall Symposium on Real-Life Reinforcement Learning, Washington DC, Oct. 22, 2004.
G. Tesauro and R. Das and W. E. Walsh and J. O. Kephart, Utility-function-driven Resource Allocation in Autonomic Systems, Proceedings of ICAC.
G. Tesauro, Online Resource Allocation Using Decompositional Reinforcement Learning, Proceedings of AAAI-05.
R. Das, G. Tesauro, and W. E. Walsh, Model-Based and Model-Free Approaches to Autonomic Resource Allocation, IBM Tech Report 2005.
David Vengerov and Nikolai Iakovlev, A Reinforcement Learning Framework for Dynamic Resource Allocation: First Results, Proceedings of ICAC 2005.
“Routing Routing Protocol Dynamic Routing Unicast Routing Protocol . . . ”Sasase, 2004, www.sasase.ics.keio.ac.jp/jugyo/2004/print/Routing%/20Protocol—p.pdf.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for reward-based learning of improved... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for reward-based learning of improved..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reward-based learning of improved... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2641242

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.