MURI Research: Project Seven

Integrated Approach to Intelligent Systems
Research Project Seven


[
Back to Project List]

Research in Adaptation and Learning

Learning must take place at all levels of representation in intelligent systems, since we cannot assume that the environment and correct system structure are known at the outset. This includes learning the environment model from sensory inputs, learning value information from normative inputs, and direct learning of control laws in supervised and unsupervised settings. In addition, the hierarchical / multi-agent structure of the system must itself be adaptable. Finally, the problem of verification for learning systems must be addressed. All of our research will take place in the context of intelligent agents with noisy sensors in complex, partially observable environments. We expect to uncover and deepen the ties between AI and adaptive control as a result.

Learning environment models
Our principal representation tool for environment models will be PNs and DPNs (discussed in Project 6). In recent work, we have derived new algorithms capable of learning PNs and DPNs from observations even in the presence of hidden variables. We propose to extend these algorithms and develop the theoretical basis and technology needed for large-scale applications.

Our basic result [48] shows that PN learning can be implemented as a local update process using information obtained directly from the PN's inference algorithm. More specifically, as each observation is processed by the standard PN inference algorithms, the results of processing can be used to compute a gradient for the likelihood of the data with respect to variations in the parameters defining the PN. Thus, a simple local update process allows the PN to adapt itself optimally to the environment. This result is very general and relies only on the fact that joint probabilities in the PN are all multilinear in the parameters defining the conditional distributions.

More general network parameterizations can be handled simply by using the chain rule for the gradient. Application to networks with intensionally defined conditional distributions is straightforward. These include noisy-OR and sigmoid networks as well as hybrid networks containing discrete and continuous variables (see Section 6).

We can also solve the DPN learning problem, using a parameterization that reflects the stationarity constraint: the generalized parameters describing the sensor and state evolution models are simply replicated across time steps. The gradient for the generalized parameters turns out to be just the sum of the gradients corresponding to the different instances of the parameter. Thus, learning DPNs is again a simple local update process. Section 6 points out that DPNs can be exponentially more compact than Hidden Markov Models as representations of complex processes. In consequence, it may be much easier to learn such DPNs. We have shown this to be the case with artificially generated data. Below, we describe potential real-world applications.

These results provide one form of unification between the fields of probabilistic networks and neural networks (both feedforward and recurrent). When such a unification occurs, benefits flow in both directions. The principal engineering advantages of multilayer perceptrons-noise tolerance, local distributed learning, continuous function space-are also realizable in PNs. PNs may have additional advantages in terms of comprehensibility, provision of prior knowledge, compositionality, and local semantics. The vast wealth of theory and experience concerning optimization methods for neural networks can be used for PN learning. Finally, we have already developed methods for directly integrating high-level PNs/DPNs with low-level sensing and control systems [42,43]; methods for PN learning will enable vision-based systems to learn to recognize high-level behaviors.

These results also open up a broad array of research tasks and applications, that build on the inference and representation tasks described in Section 6. In particular, we will investigate the use of efficient stochastic algorithms for learning DPNs. We will utilize the functional representation of conditional probability tables to learn hybrid PNs. We will investigate methods for encoding and using several types of prior knowledge in the learning process, including non-uniform priors and qualitative monotonicity and synergy constraints. We will investigate methods for automatic creation and modification of model structures, including hierarchical and first-order probabilistic models. The latter will allow for a unification of the field of Bayesian learning with the rapidly growing field of inductive logic programming (ILP). ILP has developed techniques for constructing fairly large and complex logical systems automatically from data using sophisticated ``inverse proof'' methods. First-order learning will allow the creation of much more general models that can be applied in a large variety of specific situations, as described in Section 6. We will undertake a theoretical analysis of sample and complexity bounds for PN and DPN learning. We will address the use of automated experimentation---actively probing the environment---in the context of PN learning. Finally, we will apply these techniques in a variety of domains, including speech recognition and learning human driver models from videotapes [42].

Learning-based verification
Along with the potential advantages of learning systems there is a potential drawback: how can one guarantee the properties of the system that results from learning? We believe that it is possible to have the best of both worlds by combining standard verification techniques with the formal techniques derived in computational learning theory. The process has two steps. First, learning theory techniques are used to bound the error on learned environment models or control laws as a function of the amount of data observed. Second, a formal analysis of decision quality and system failure rates when the environment model is only approximately correct then yields guarantees on the performance of the learned control system. Thus, we can guarantee the performance of learning-based systems designed to operate in unknown environments. We have taken some steps in this direction, deriving the first known learning theory results for PN learning [49]. As with reinforcement learning, we expect this to be an area of active cooperation between CS, AI, and control theory.

Reinforcement learning
Reinforcement learning (RL) deals with systems that learn from rewards---short-term payoff information from the environment. The standard model maps directly onto asynchronous adaptive dynamic programming and stochastic optimal control. It is the area of AI that most closely and actively connects to control theory. Its applications range from games (the world's leading backgammon player, TDGammon, is a program trained using reinforcement learning) through robotic control to battle tactics. To date, RL has concentrated on fully observable environments. As discussed in Section 6, partially observable environments, which constitute the vast majority of real applications, require optimal intelligent agents to make decisions on the basis of the current belief state. Solving POMDPs (Partially Observable Markov Decision Processes) is PSPACE-hard, and exact algorithms can solve only tiny problems (order of 10 states). We propose an approach based on three sources of power to allow scaling up: (1) use reinforcement learning methods to solve the POMDP approximately, focusing on states arising in the agent's actual experience; (2) use a function approximator to represent the utility information, allowing generalization over input states; (3) use a DPN to provide an exponentially reduced representation of the belief state for input to the utility function approximator. We have shown that this general approach is feasible and that the combination of 1 and 2 leads to a system capable of solving problems at least an order of magnitude larger than previous algorithms[50].We are currently incorporating (3) and expect to have preliminary results soon. When combined with DPN learning to learn the environment model, we have potentially very general and powerful learning system for situated agents. We will apply these ideas to the problem of learning fully autonomous vehicle control in the IVHS testbed.

In addition to RL for POMDPs, we will investigate the problem of building hierarchical RL agents. The technical basis proposed by [51] suggests that learning and reasoning can occur at multiple levels of abstraction, with concomitant exponential reductions in complexity, without sacrificing the rigorous Markovian framework. We will pursue this line of work with a view to developing a complete theory of hierarchical abstraction for stochastic optimal control systems.

Nonmonotonic Systems and Learning
Classical logic has developed over the twentieth century into an immense complex of powerful tools and theorems. Under conditions of certainty about the environment and its behavior, systems based on classical logic can generate verifiably correct behavior under all circumstances, often in combinatorially huge state spaces. When information is lacking, however, classical logic provides no solutions.

The AI subfield of nonmonotonic logic attempts to remedy this problem. Nonmonotonic systems allow conclusions to be drawn from the absence of information, and to be retracted if later observations contradict them. For example, it is normal to conclude that an enemy soldier is armed; yet this conclusion can be retracted on closer inspection. In this way, nonmonotonic systems go beyond the capabilities of classical logic.

We propose to investigate ways in which nonmonotonic logic can be extended and applied to problems of intelligent control. This includes the task of learning appropriate ``rules of thumb'' from observations.Appropriateness depends on (1) the degree of approximation to truth, (2) the costs of jumping to false conclusions, and (3) the benefits in terms of rapid decision making even in conditions of massive uncertainty.

Top




[MURI Home|Website Managers|September 1996]