Data processing: artificial intelligence – Adaptive system
Reexamination Certificate
1999-09-23
2003-03-11
Follansbee, John A. (Department: 2121)
Data processing: artificial intelligence
Adaptive system
C706S021000, C706S023000
Reexamination Certificate
active
06532454
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed to a method of providing stable control of a system (e.g., an electrical power grid, a factory, and a financial prediction system) using a neural network-based design, and more particularly, to a neural network-based control system and method using critics.
2. Discussion of the Background
The field of intelligent control is an expansive area that has attempted to solve many complex problems. Significant research has previously been performed in the areas of classical control theory [1-6] and biologically-inspired intelligent control [7-15]. As an outgrowth of that and other research, researchers have attempted to solve the Generalized Moving Target (GMT) problem [16,17], which is defined as follows: for a function E(
v
,
w
), find two vectors
v
and
w
such that the following two conditions are met: (1) E is minimized with respect to
v
for
w
fixed; (2)
w
=
v
.
Other research includes two additional classes of control design. The first class is “adaptive control,” in the tradition of Narendra, which includes both linear (multiple-input multiple-output, MIMO) designs [1] and nonlinear or neural extensions [18,19]. The second class is learning-based Approximate Dynamic Programming (ADP) [7-15], which has sometimes been presented as a form of “reinforcement learning” [20-22], sometimes called “adaptive critics” [23] and sometimes called “neuro-dynamic programming” [24].
Previous forms of adaptive control discussed by Narendra, had difficulty in ensuring stability even in the linear-quadratic case. Those designs appear very similar to a particular ADP design, HDP+BAC[8,22,25], that was developed back in 1971, and reported in Werbos' Ph.D. proposal[26]. In fact, when HDP+BAC is applied to the linear-quadratic case, it reduces down to a standard indirect adaptive control (IAC) design, except that there is a provision for adapting one extra matrix which, in principle, should permit much stronger stability guarantees. Roughly speaking, ordinary LAC designs are based on the minimization of tracking error, which is sometimes just:
∑
i
⁢
(
x
i
(
ref
)
⁡
(
t
+
1
)
-
x
i
⁡
(
t
+
1
)
)
2
=
e
_
⁡
(
t
+
1
)
T
⁢
e
_
⁡
(
t
+
1
)
,
(
1
)
where the vector
x
(ref)
(t) represents the desired state of the plant or external environment at time t, where
x
(t) represents the actual state, and
e
(t) represents the gap or error between the two. In the same situation, HDP+BAC reduces to the same overall design, except that:
Ĵ
(
t
+1)=
e
(
t
+1)
T
Ce
(
t
+1), (2)
is minimized, where C is a matrix of “Critic weights” to be adapted by HDP. In the nonlinear case, equation (2) is replaced by an artificial neural network (ANN), in order to allow Ĵ to approximate any nonlinear function. (In HDP and GDHP, it is possible to supplement the training of the Action network, to include some stochastic search (particularly in the offline mode) to help keep the system out of local minima [46]. Barto, Thrun and others have discussed other approaches to adding noise to the Action network or controller [8,21].)
In theory, if the critic weights C or the Critic network converged to the right values (the values which satisfy the Hamilton-Jacobi-Bellman equation [4,5,24,28]), then the function Ĵ would serve as a Liapunov function guaranteed to stabilize the overall system, if the system is controllable. However, none of the established methods for adapting a Critic possess quadratic unconditional stability, even in the linear deterministic case. The variations of these methods based on the Galerkin approach to solving differential equations do possess unconditional stability in that case, but they converge to the wrong weights almost always in the linear stochastic case.
The notation herein generally tracks the sources upon which it draws. However two biased selections have been made herein. Herein, “C”, “
b
” and W are used for the estimated values of certain sets of parameters or weights, and “C*” or “
b
*” are used for their true values. On the contrary, for the functions J and
&lgr;
, the usual three-fold convention of using the “J” and “
&lgr;
” for the true values, “J*” and “
&lgr;
*” for target values used in adaptation, and “Ĵ” and “
{umlaut over ({circumflex over (e)})}
” for the estimated values are used. The state vector is identified as “∂
t
x
” in linear adaptive control, rather than the usual x-dot. The notation “∂
t
” has long been part of the standard notation in physics.
In principle, the ideal feedback control system should try to optimize some mix of stability and performance, in the face of three general types of uncertainty:
(1) High-bandwidth random disturbances, which are usually represented as a stochastic process, based on random noise [4,5,26,29], but are sometimes represented as bounded disturbances of unknown structure [1];
(2) Drifting values (and occasional abrupt changes) in familiar process parameters such as friction, viscosity and mass, often due to the aging of a plant or the changes in general environmental conditions:
(3) Uncertainty about the fundamental structure of the plant or environment (sometimes due to catastrophic events, like a wing being shot off of an airplane), and shifts of parameters in ways that could not be anticipated or simulated even as possibilities at the time when the controller is developed.
This three-fold distinction is difficult to formalize, in mathematical terms, but it has great practical importance. Roughly speaking, the ability to respond to the first type of disturbance or uncertainty may be called “stochastic feedback control.” The ability to respond to the second type may be called “adaptation,” and the ability to respond to the third type is called “learning.” The practical tradeoffs here are discussed at some length in the introductory review in [7], and in other papers cited in [7].
“Adaptive control” [1-3] has often been viewed as a tool for addressing the second type of uncertainty—uncertainty about drifting plant parameters. The most classical designs in adaptive control are intended to control plants
x
(t) governed by the equations:
∂
t
x
=A
x
+B
u
,
(3)
or:
x
(
t
+1)=
A
x
(
t
)+
B
u
(
t
), (4)
where
u
is a vector of controls, where A and B are unknown matrices representing the parameters of the plant, and where ∂
t
represents differentiation with respect to time. In the
simplest case, the state vector
x
is directly observable. The key idea is to develop control designs which estimate the matrices A and B, explicitly or implicitly, as the process unrolls in real time, and converge “on the fly” to a good control strategy:
u
(
t
)=
K
(
t
)
x
(
t
), (5)
despite the ignorance about A and B. In the general case, it is assumed that
x
(t) is not directly observable. Instead, a vector
v
governed by:
v
(
t
)=
H
x
(
t
) (6)
is observed. Roughly speaking, this requires the use of a more complex control strategy like:
u
(
t
)=
K
1,1
v
(
t
−1)+
K
1,2
v
(
t
−2)+ . . . +
K
1,k
v
(
t−k
)+
K
2,1
u
(
t
−1)+ . . . +
K
2,k
u
(
t−k
), (7)
for some integer k. (See [1, p.411].) There exists a huge body of stability theorems for adaptive control, both in the original linear versions [1-3] and in a variety of nonlinear and neural extensions (e.g., [18,19]).
In practice, however, all of these stability theorems require very strong assumptions about the plant or environment to be controlled. Ordinary adaptive control and neural adaptive control have often exhibited problems with stability and with slow response to transient disturbances, particularly in real-world pla
Follansbee John A.
Hirl Joseph P.
Oblon & Spivak, McClelland, Maier & Neustadt P.C.
LandOfFree
Stable adaptive control using critic designs does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Stable adaptive control using critic designs, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Stable adaptive control using critic designs will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3004963