Feature Selection and Case-Based Learning
One problem with case-based approaches to learning is their
sensitivity to irrelevant attributes. During 1993, while we were
visiting Siemens Corporate Research in Princeton, Stephanie Sage
and I carried out scaling experiments that revealed the extent of
this problem: the number of training cases needed by nearest
neighbor to reach a given level of accuracy grows exponentially
with the number of irrelevant features.
These results motivated us to explore methods for removing
irrelevant attributes, and the technique we developed involved
starting with all features and greedily removing those that did
not affect nearest neighbor's accuracy on the training data.
Upon moving to Stanford in 1994, I became aware of similar
work by George John, Ronny Kohavi, and Karl Pfleger on feature
selection for decision-tree induction, as well as related efforts
by Andrew Moore and colleagues on nearest neighbor. These
general approaches are now known as wrapper methods, and
though new to machine learning, they have existed in pattern
recognition for some time.
We named our particular system Oblivion, since it could be viewed
as constructing "oblivious" decision trees in which the same
test occurs across an entire level of the tree. The terminal
nodes correspond to abstract cases with associated classes.
As usual, experiments showed improvements in some domains but
not in others, apparently because many of the UCI data sets
contain correlated features but few irrelevant ones. More
recent work with the ConDet algorithm, which represents it
output as a condensed table, has links to Jeff Schlimmer's
work on inducing determinations and has implications for data
mining of understandable structures.
Related Publications
-
Blum, A., & Langley, P. (1997).
Selection of relevant features and examples in machine learning.
Artificial Intelligence, 97, 245-271.
-
Langley, P., & Sage, S. (1997).
Scaling to domains with irrelevant features.
In R. Greiner et al. (Ed.), Computational learning theory and
natural learning systems (Vol. 4). Cambridge, MA: MIT Press.
-
Kohavi, R., Langley, P., & Yun, Y. (1997).
The utility of feature weighting in nearest-neighbor algorithms.
Proceedings of the Ninth European Conference on Machine Learning.
Prague: Springer-Verlag.
-
Langley, P. (1996).
Induction of condensed determinations.
Proceedings of the Second International Conference of Knowledge
Discovery and Data Mining (pp. 327-330). Portland: AAAI Press.
-
Langley, P., & Sage, S. (1994).
Oblivious decision trees and abstract cases.
Working Notes of the AAAI94 Workshop on Case-Based Reasoning
(pp. 113-117). Seattle, WA: AAAI Press.
-
Langley, P., & Sage, S. (1994).
Pruning irrelevant features from oblivious decision trees.
Proceedings of the AAAI Fall Symposium on Relevance (pp. 145-148).
New Orleans: AAAI Press.
-
Langley, P. (1994).
Selection of relevant features in machine learning.
Proceedings of the AAAI Fall Symposium on Relevance. New Orleans:
AAAI Press.
For more information, send electronic mail to
langley@isle.org