Electrical computers and digital processing systems: support – Digital data processing system initialization or configuration
Reexamination Certificate
1999-03-11
2002-10-29
Lee, Thomas (Department: 2185)
Electrical computers and digital processing systems: support
Digital data processing system initialization or configuration
C706S001000, C706S005000, C706S015000, C706S047000
Reexamination Certificate
active
06473851
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of Invention
The present invention generally relates to policy-based controllers and policy-based process servers.
2. Background—Discussion of Prior Art
This section puts the invention into its proper context. We provide a cursory background and define required terminology. Readers unfamiliar with stochastic control, reinforcement learning, or optimal process control may find the next several subsections helpful in defining the fundamental underlying technologies. Readers very familiar with these topics should at least skim these sections to review general terminology.
A. Scope of Applicability and Main Concepts
This invention is closely related to technologies of Stochastic Control and Reinforcement Learning. Control systems technology is rather well-developed and has numerous sub-areas. Because of this the reader may be accustomed to different terminology to refer to the concepts used here. The terminology we use is in line with definitions employed in [Kaelbling Littman and Moore 1996] and [Sutton and Barto 1998], which provide background survey information, tutorial treatment, precise definitions of technical concepts discussed here, and as well as a clear explanation of the prior art.
Any concepts that are not standard fare in these references are defined here in order to provide a self-contained description. We try to introduce a bare minimum of technical jargon. Crucial technical definitions are formalized using mathematical notation in the sections titled “Formal Definition of Prior Art” and “Formal Definition of the Mixture of Policies Framework.”
1. Separation of Policy and Execution
In the technical jargon of control theory, the mapping of a stimulus to a set of action tendencies is referred to as a “policy.” Given a set of candidate actions and a stimulus, a policy is a function that recommends one or more actions in response to the given stimulus. Stochastic Control pertains to the technology of using a stochastic policy to controlling action selection processes.
FIGS. 1A and 1B
illustrate examples of policies. An action selection module then uses a policy to guide its selection of the action or actions from the permissible set of candidate actions. Some control mechanisms specified in the prior art do not separate policy from execution, but here we do. The essential concepts remain whether or not the execution mechanism is inextricably intertwined with the policy data structure or separated as is the case here. The policy “recommends” actions, the action selection module “executes” one or more actions according to this recommendation. This execution mechanism can be straightforward, such as the greedy method of always selecting the highest ranked action. Or it can be more involved, such for example additional checks are made to determine whether an action will conflict with other ongoing actions before triggering it. (See the tutorial references [Kaelbling Littman and Moore 1996] and [Sutton and Barto 1998] for more discussion of how to convert policy information into action selection procedures.)
2. Controllers can Trigger “Actions” as well as “Procedures”
Although we speak about “actions” and “action selection,” the controllers described in this document can also regulate procedures. Therefore, an “action selection module” as defined here can control (a) instantaneous actions, (b) ballistic (non-interruptible and non-modifiable) action sequences, but can also regulate (c) ongoing physical processes or (d) branching procedures.
Actions controlled or initiated by a policy can be
1. Momentary or instantaneous: e.g., flash a light bulb, flip a switch.
2. Continuous: e.g., gradually increment the temperature of a furnace over time.
3. Procedural: initiate a multiple step and possibly branching computer program.
Furthermore, actions can be
1. Discrete: e.g. a database containing a finite set of actions indexed by an integer record pointer. An example of this is an web-based ad server for the purpose of displaying a particular ad targeted at a website visitor.
2. Continuous: e.g., a possibly multidimensional control signal indexed by a point within a Euclidean vector space, such as an electronic control system. An example of this is an electronic vacuum pressure regulator inside an automobile.
We refer to actions for simplicity but without loss of generality because an action can mean triggering a procedure, parameterizing the initial state of a procedure, or modifying state information used by an ongoing procedure.
3. Compatible with Reinforcement Learning Technologies
Although this invention does not provide new technology for learning per se all the policy and control mechanisms described here are compatible with the general framework of reinforcement learning theory. As is apparent from the prior art, the general approach used here (i.e., encapsulation and modularization of the data structures and mechanisms involved in formulating policy and executing policy) reduce the computational burden of obtaining policy information. Various statistical, computational, and programming technologies can be applied to obtain a policy. These technologies are well developed and include a wide variety of computational, statistical, and electronic methods. Methods for obtaining or refining policy include (a) explicit programming, (b) direct computation, (c) evolutionary design, (d) evolutionary programming, (e) computerized discovery over historical data stores, (f) computerized statistical inference over historical data stores, (f) computerized real-time direct search, and (g) real-time reinforcement learning. See [Kaelbling Littman and Moore 1996] and [Sutton and Barto 1998] for a review and additional references.
A policy can be
1. Probabilistic: actions are weighted by a probability distribution over the action database. In this case the action selection module picks one action at random drawn according to this distribution. See for instance FIG.
1
A.
2. Deterministic: only a single action is recommended. See for instance FIG
1
B.
The field of Reinforcement Learning provides technologies for systematically learning, discovering, or evolving policies suitable for stochastic control. Reinforcement learning theory is a fairly mature technology. The field of Fuzzy Control modifies this functionality to allow the following:
3. Fuzzy Membership Assignment: a distribution (possibly non-probabilistic) is applied over the actions in the action database. See FIG
1
C.
Given a fuzzy policy the action selection module simultaneously applies one or more of the actions. Therefore, fuzzy control as defined here allows multiple actions to be triggered in parallel. Moreover the action selection mechanism may also utilize the weighting specified by the distribution to initialize parameters of each action. See for instance FIG
1
C.
The definition of Fuzzy Policy we use here may be inconsistent with definitions used in prior art, and is not included in the tutorial treatment explained in [Kaelbling Littman and Moore 1996] and [Sutton and Barto 1998], which concentrate exclusively on stochastic control. However, Fuzzy Policy as defined here is related to “fuzzy sets” in that they both specify “degree of membership” rather than “probability.” Fuzzy Policy as defined here also allows more than one action to be selected in parallel by the action selection mechanism, whereas a stochastic policy expects only a single action to be selected at one moment in time.
4. A Policy is a Mulit-valued “Recommendation,” a Value Function is a “Ranking”
Closely related to the notion of “policy” is the “value function.” Rather than a probabilistic distribution over the action database, a value function assigns a numerical weight to each action. A policy formulation mechanism then converts this value function into a policy. What we define as a “fuzzy policy” suffices for representing value functions. Therefore, we can manipulate value functions by treating them as Fuzzy Policies.
Technology for co
Du Thuan
Lee Thomas
LandOfFree
System for combining plurality of input control policies to... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for combining plurality of input control policies to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for combining plurality of input control policies to... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2931525