Learning in Planning and Problem Solving

During the 1970's, Carnegie Mellon was the major center for research on production-system architectures, a framework that represented knowledge in terms of condition-action rules. One argument for production systems was that they should support learning better than frameworks that represented knowledge in a less modular manner. However, this remained only an intuition until, in the late 1970's, Anzai and Simon developed an adaptive production system that improved its ability to solve problems with experience.

As a result of their work, I became excited about learning in problem solving, as did a number of other researchers. We realized that the key was mapping individual condition-action rules onto problem-space operators, which let one decompose the overall task into separate tasks, each of which involved learning the heuristic conditions for applying an operator. Also, one could generate training cases for each subtask by finding solution paths through search, then labeling actions along the solution path as positive instances of the applied operator and actions one step off the path as negative instances.

During the early 1980's, I developed SAGE, a system that took this approach to learning heuristics for state-space search. The system started with the legal conditions on each operator, which it represented as a production rule, and carried out search to find a solution path. SAGE used to resulting solution to generate positive and negative training cases, which it passed to a discrimination learning method that constructed more specific rules with heuristic conditions. I tested this approach on a variety of puzzles and other problem-solving tasks and also adapted it to automatically construct models of student's subtraction strategies.

After moving to UCI, I developed an interest in means-ends analysis, which seemed a better model of human planning and problem solving than state-space search. Together with Randy Jones, I developed Eureka, a system that took a similar approach to learning in the means-ends framework. The main differences were that the system could select an operator even if its legal preconitions were not met (since means-ends analysis then creates a subproblem to meet those preconditions) and that it learned through a form of analogical reasoning over stored solutions rather than through discrimination. We showed that Eureka modeled some qualitative features of human problem solving, including some aspects of scientific insight.

As part of the larger Icarus project, John Allen and I developed a similar system, Daedalus, that also acquired search-control knowledge for means-ends planning. This system differed from Eureka in that it used an extension of Cobweb to acquire a hierarchy of probabilistic concepts to summarize its planning knowledge. Each node in this hierarchy included literals describing the current state, the desired state, and the operator selected. Daedalus sorted each new problem or subproblem through this hierarchy, then selected the most probable operator stored at the terminating node. We tested the system on a number of planning domains and also showed that it matched some qualitative aspects of human behavior.

One debate that raged among researchers in the area of learning for problem solving revolved around how to best represent and use the acquired knowledge. Some approaches focused on learning separate search-control knowledge for each operator, as in SAGE, Eureka, and Daedalus. Others stored fixed macro-operators that characterized the sequential aspects of solution paths, and still others stored complete solutions but adapted them to the current problem. Our experience with Daedalus led us to develop a successor system, Talus, that supported a common representation for all three approaches. Experiments with the system suggested that each approach was superior to others (in terms of learning rates) on some problem types but inferior on others, again showing that debates are better settled with data than with rhetoric.

Although research on learning for problem solving has decreased in recent years, the growing body of work on reinforcement learning addresses many of the same issues, even though researchers in this area typically ignore the older work on problem solving. As I have recently delved into reinforcement learning myself, I hope to correct this oversight, at least in my own papers on the topic.

Publications on Daedalus

Langley, P., & Allen, J. A. (1993). A unified framework for planning and learning. In S. Minton (Ed.), Machine learning methods for planning. San Mateo, CA: Morgan Kaufmann.
Langley, P., & Allen, J. A. (1991). Learning, memory, and search in planning. Proceedings of the Thirteenth Conference of the Cognitive Science Society (pp. 364-369). Chicago: Lawrence Erlbaum.
Allen, J. A., & Langley, P. (1990). Integrating memory and search in planning. Proceedings of the Workshop on Innovative Approaches to Planning, Scheduling, and Control (pp. 301-312). San Diego, CA: Morgan Kaufmann.
Allen, J. A., & Langley, P. (1989). Using concept hierarchies to organize plan knowledge. Proceedings of the Sixth International Workshop on Machine Learning (pp. 229-231). Ithaca, NY: Morgan Kaufmann.

Publications on Eureka

Jones, R. M., & Langley, P. (2005). A constrained architecture for learning and problem solving. Computational Intelligence, 21, 480-502.
Jones, R., & Langley, P. (1995). Retrieval and learning in analogical problem solving. Proceedings of the Seventeenth Conference of the Cognitive Science Society (pp. 466-471). Pittsburgh: Lawrence Erlbaum.
Jones, R., & Langley, P. (1988). A theory of scientific problem solving. Proceedings of the Tenth Conference of the Cognitive Science Society (pp. 244-250). Montreal, Quebec: Lawrence Erlbaum.

Publications on SAGE

Langley, P. (1985). Learning to search: From weak methods to domain-specific heuristics. Cognitive Science, 9, 217-260.
Langley, P. (1983). Learning search strategies through discrimination. International Journal of Man-Machine Studies, 18, 513-541.
Langley, P. (1983). Learning effective search heuristics. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (pp. 419-421). Karlsruhe, West Germany: Morgan Kaufmann.
Langley, P. (1982). Strategy acquisition governed by experimentation. Proceedings of the European Conference on Artificial Intelligence (pp. 171-176). Orsay, France.

Generic Publications on Learning in Problem Solving

Allen, J. A., Langley, P., & Matwin, S. (1992). Knowledge and regularity in planning. Proceedings of the AAAI Spring Symposium on Computational Considerations in Supporting Incremental Modification and Reuse (pp. 7-12). Stanford, CA.
Sleeman, D., Langley, P., & Mitchell, T. (Spring, 1982). Learning from solution paths: An approach to the credit assignment problem. AI Magazine, 3, 48-52.
Langley, P.\ (1980). Finding common paths as a learning mechanism. Proceedings of the Third Biennial Conference of the Canadian Society for Computational Studies of Intelligence (pp. 12-18). Victoria, BC.

For more information, send electronic mail to langley@isle.org