As a result of their work, I became excited about learning in problem solving, as did a number of other researchers. We realized that the key was mapping individual condition-action rules onto problem-space operators, which let one decompose the overall task into separate tasks, each of which involved learning the heuristic conditions for applying an operator. Also, one could generate training cases for each subtask by finding solution paths through search, then labeling actions along the solution path as positive instances of the applied operator and actions one step off the path as negative instances.
During the early 1980's, I developed SAGE, a system that took this approach to learning heuristics for state-space search. The system started with the legal conditions on each operator, which it represented as a production rule, and carried out search to find a solution path. SAGE used to resulting solution to generate positive and negative training cases, which it passed to a discrimination learning method that constructed more specific rules with heuristic conditions. I tested this approach on a variety of puzzles and other problem-solving tasks and also adapted it to automatically construct models of student's subtraction strategies.
After moving to UCI, I developed an interest in means-ends analysis, which seemed a better model of human planning and problem solving than state-space search. Together with Randy Jones, I developed Eureka, a system that took a similar approach to learning in the means-ends framework. The main differences were that the system could select an operator even if its legal preconitions were not met (since means-ends analysis then creates a subproblem to meet those preconditions) and that it learned through a form of analogical reasoning over stored solutions rather than through discrimination. We showed that Eureka modeled some qualitative features of human problem solving, including some aspects of scientific insight.
As part of the larger Icarus project, John Allen and I developed a similar system, Daedalus, that also acquired search-control knowledge for means-ends planning. This system differed from Eureka in that it used an extension of Cobweb to acquire a hierarchy of probabilistic concepts to summarize its planning knowledge. Each node in this hierarchy included literals describing the current state, the desired state, and the operator selected. Daedalus sorted each new problem or subproblem through this hierarchy, then selected the most probable operator stored at the terminating node. We tested the system on a number of planning domains and also showed that it matched some qualitative aspects of human behavior.
One debate that raged among researchers in the area of learning for problem solving revolved around how to best represent and use the acquired knowledge. Some approaches focused on learning separate search-control knowledge for each operator, as in SAGE, Eureka, and Daedalus. Others stored fixed macro-operators that characterized the sequential aspects of solution paths, and still others stored complete solutions but adapted them to the current problem. Our experience with Daedalus led us to develop a successor system, Talus, that supported a common representation for all three approaches. Experiments with the system suggested that each approach was superior to others (in terms of learning rates) on some problem types but inferior on others, again showing that debates are better settled with data than with rhetoric.
Although research on learning for problem solving has decreased in recent years, the growing body of work on reinforcement learning addresses many of the same issues, even though researchers in this area typically ignore the older work on problem solving. As I have recently delved into reinforcement learning myself, I hope to correct this oversight, at least in my own papers on the topic.
For more information, send electronic mail to langley@isle.org
© 1997 Institute for the Study of Learning and Expertise. All rights reserved. |