Data processing: artificial intelligence – Machine learning
Reexamination Certificate
2011-08-16
2011-08-16
Gaffin, Jeffrey A (Department: 2129)
Data processing: artificial intelligence
Machine learning
Reexamination Certificate
active
08001063
ABSTRACT:
In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves obtaining a decision-making entity and a reward mechanism. The decision-making entity manages a plurality of application environments supported by a data processing system, where each application environment operates on data input to the data processing system. The reward mechanism generates numerical measures of value responsive to actions performed in states of the application environments. The decision-making entity and the reward mechanism are applied to the application environments, and results achieved through this application are processed in accordance with reward-based learning to derive a policy. The reward mechanism and the policy are then applied to the application environments, and the results of this application are processed in accordance with reward-based learning to derive a new policy.
REFERENCES:
patent: 6230200 (2001-05-01), Forecast et al.
patent: 6581048 (2003-06-01), Werbos
patent: 2002/0178103 (2002-11-01), Dan et al.
patent: 2003/0149685 (2003-08-01), Trossman et al.
patent: 2004/0230459 (2004-11-01), Dordick et al.
patent: 2005/0071825 (2005-03-01), Nagaraj et al.
patent: 2005/0141554 (2005-06-01), Hammarlund et al.
patent: 2005/0165925 (2005-07-01), Dan et al.
patent: 2006/0179383 (2006-08-01), Blass et al.
patent: 2006/0224535 (2006-10-01), Chickering et al.
patent: 2007/0006278 (2007-01-01), Ioan Avram et al.
‘Temporal Difference Learning and TD-Gammon’: Tesauro, 1995, IEEE Communications of the ACM, vol. 38, No. 3, pp. 58-68.
‘Elements of Artificial Neural Networks’: Mehrotra, 1997, MIT press.
‘Cooperative Multi-Agent Learning: The state of the art’: Panait, George Mason University, Computer Science, Technical Report-200301; 2003.
Obitko M. Genetic Algorithms, 2003, [retrieved on Jul. 12, 2010]. Retrieved from Internet:<http://labe.felk.cvut.cz/˜obitko/ga/>.
RL-Based Online Resource Allocation in Multi-Workload Computing Systems. Presented by Gerald Tesauro at the AAAI Fall Symposium on Real-Life Reinforcement Learning, Washington DC, Oct. 22, 2004.
G. Tesauro and R. Das and W. E. Walsh and J. O. Kephart, Utility-function-driven Resource Allocation in Autonomic Systems, Proceedings of ICAC.
G. Tesauro, Online Resource Allocation Using Decompositional Reinforcement Learning, Proceedings of AAAI-05.
R. Das, G. Tesauro, and W. E. Walsh, Model-Based and Model-Free Approaches to Autonomic Resource Allocation, IBM Tech Report 2005.
David Vengerov and Nikolai Iakovlev, A Reinforcement Learning Framework for Dynamic Resource Allocation: First Results, Proceedings of ICAC 2005.
“Routing Routing Protocol Dynamic Routing Unicast Routing Protocol . . . ”Sasase, 2004, www.sasase.ics.keio.ac.jp/jugyo/2004/print/Routing%/20Protocol—p.pdf.
Das Rajarshi
Jong Nicholas K.
Kephart Jeffrey O.
Tesauro Gerald James
Coughlan Peter
Gaffin Jeffrey A
International Business Machines - Corporation
LandOfFree
Method and apparatus for reward-based learning of improved... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for reward-based learning of improved..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reward-based learning of improved... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2641242