Feature Selection and Case-Based Learning

One problem with case-based approaches to learning is their sensitivity to irrelevant attributes. During 1993, while we were visiting Siemens Corporate Research in Princeton, Stephanie Sage and I carried out scaling experiments that revealed the extent of this problem: the number of training cases needed by nearest neighbor to reach a given level of accuracy grows exponentially with the number of irrelevant features.

These results motivated us to explore methods for removing irrelevant attributes, and the technique we developed involved starting with all features and greedily removing those that did not affect nearest neighbor's accuracy on the training data. Upon moving to Stanford in 1994, I became aware of similar work by George John, Ronny Kohavi, and Karl Pfleger on feature selection for decision-tree induction, as well as related efforts by Andrew Moore and colleagues on nearest neighbor. These general approaches are now known as wrapper methods, and though new to machine learning, they have existed in pattern recognition for some time.

We named our particular system Oblivion, since it could be viewed as constructing "oblivious" decision trees in which the same test occurs across an entire level of the tree. The terminal nodes correspond to abstract cases with associated classes. As usual, experiments showed improvements in some domains but not in others, apparently because many of the UCI data sets contain correlated features but few irrelevant ones. More recent work with the ConDet algorithm, which represents it output as a condensed table, has links to Jeff Schlimmer's work on inducing determinations and has implications for data mining of understandable structures.

Related Publications

Blum, A., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245-271.
Langley, P., & Sage, S. (1997). Scaling to domains with irrelevant features. In R. Greiner et al. (Ed.), Computational learning theory and natural learning systems (Vol. 4). Cambridge, MA: MIT Press.
Kohavi, R., Langley, P., & Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms. Proceedings of the Ninth European Conference on Machine Learning. Prague: Springer-Verlag.
Langley, P. (1996). Induction of condensed determinations. Proceedings of the Second International Conference of Knowledge Discovery and Data Mining (pp. 327-330). Portland: AAAI Press.
Langley, P., & Sage, S. (1994). Oblivious decision trees and abstract cases. Working Notes of the AAAI94 Workshop on Case-Based Reasoning (pp. 113-117). Seattle, WA: AAAI Press.
Langley, P., & Sage, S. (1994). Pruning irrelevant features from oblivious decision trees. Proceedings of the AAAI Fall Symposium on Relevance (pp. 145-148). New Orleans: AAAI Press.
Langley, P. (1994). Selection of relevant features in machine learning. Proceedings of the AAAI Fall Symposium on Relevance. New Orleans: AAAI Press.

For more information, send electronic mail to langley@isle.org