Computational Scientific Discovery
I became fascinated with the nature of scientific discovery as an
undergraduate at TCU, and the interest has remained to this day.
My dissertation work at CMU focused on Bacon, an AI system that
rediscovered numeric laws from the history of physics. Herbert Simon
served as my advisor and contributed many ideas to the effort. Gary
Bradshaw and I extended the system to handle additional laws,
including ones from the history of chemistry. After Jan Zytkow
joined our group, we developed new systems (Stahl and Dalton)
that dealt with the discovery of qualitative laws and structural
models. This CMU work forms the basis of my early publications on
scientific discovery, culminating in our 1987 book (see below).
After moving to UCI, I continued my collaborations with graduate
students there. Donald Rose and I refined the Stahl work on chemical
discovery and devised a hill-climbing system, Revolver, that handled
aspects of particle physics. Randy Jones and I developed Eureka,
a computational model of scientific insight that relied on analogical
reasoning combined with spreading activiation retrieval. And with
Bernd Nordhausen, I constructed IDS, a system that integrated our
previous work on taxonomy formation, discovery of qualitative laws,
and finding numeric relations. In addition, Jeff Shrager and I
organized a symposium on scientific discovery and edited a book
reporting recent work in the area.
My activities in scientific discovery slowed down during my times
at NASA Ames, Siemens, and Stanford, but for funding reasons rather
than for lack of interest. After this hiatus, a collaboration with
Sakir Kocabas led to some new results in particle physics and
astrophysics, followed by joint work with Jeff Shrager, Kazumi Saito,
Mark Schwabacher, Chris Potter, Andrew Pohorille, and others on
approaches to computational discovery in biology and in Earth
science.
Over the past decade, most of my discovery research has focused
on a new framework, inductive process modeling, that combines
background knowledge in the form of generic processes with
time-series data to construct explanatory models stated as
sets of differential equations. The basic approach carries
out exhaustive search through a space of model structures
followed by gradient descent through the parameter space for
each candidate structure. Later work extended the framework
to use constraints among processes to guide search through the
structure space and even to induce constraints to discriminate
between successful and unsuccessful structures. This effort
involved collaborations with Will Bridewell, Ljupco Todorovski,
Saso Dzeroski, Kevin Arrigo, Suart Borrett, and many others.
Most recently, Adam Arvay and I have developed and implemented
a new approach to inducing process models that associates a
rate expression with each process P and that assumes each of
P's derivatives is proportional to this rate. Together, these
assumptions let us estimate parameters not with gradient descent
search, which requires repeated simulations and can find local
optima, but with multiple linear regression. The resulting
systems are both far more robust than their predecessors and
run nearly a million times faster even on simple tasks. Our
latest efforts have extended this approach to support adaptation
of models to new settings and to find more complex equations
through a form of variable selection.
Papers on Induction of Rate-Based Process Modeling
-
Langley, P., & Arvay, A. (2019).
Scientific discovery, process models, and the social sciences.
In M. Addis, P. C. R. Lane, P. D. Sozou, & F. Gobet (Eds.),
Scientific discovery in the social sciences. Heidelberg: Springer.
-
Langley, P. (2019).
Scientific discovery, causal explanation, and process model induction.
Mind & Society, 18, 43-56.
-
Langley, P., & Arvay, A. (2017).
Flexible model induction through heuristic process discovery.
Proceedings of the Thirty-First AAAI Conference on Artificial
Intelligence (pp. 4415-4421). San Francisco: AAAI Press.
-
Arvay, A., & Langley, P. (2016).
Selective induction of rate-based process models.
Proceedings of the Fourth Annual Conference on Cognitive Systems.
Evanston, IL.
-
Arvay, A., & Langley, P. (2016).
Heuristic adaptation of scientific process models.
Advances in Cognitive Systems, 4, 207-226.
-
Arvay, A., & Langley, P. (2015).
Heuristic adaptation of rate-based process models.
Proceedings of the Third Annual Conference on Cognitive Systems.
Atlanta, GA.
-
Langley, P., & Arvay, A. (2015).
Heuristic induction of rate-based process models.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial
Intelligence (pp. 537-543). Austin, TX: AAAI Press.
Earlier Papers on Inductive Process Modeling
-
Todorovski, L., Bridewell, W., & Langley, P. (2012).
Discovering constraints for inductive process modeling.
Proceedings of the Twenty-Sixth AAAI Conference on Artificial
Intelligence (pp. 256-262). Toronto: AAAI Press.
-
Park, C., Bridewell, W., & Langley, P. (2010).
Integrated systems for inducing spatio-temporal process models.
Proceedings of the Twenty-Fourth AAAI Conference on Artificial
Intelligence (pp. 1555-1560). Atlanta: AAAI Press.
-
Bridewell, W., & Langley, P. (2010).
Two kinds of knowledge in scientific discovery.
Topics in Cognitive Science, 2, 36-52.
-
Bridewell, W., Borrett, S. R., & Langley, P. (2009).
Supporting the construction of dynamic scientific models.
In A. Markman & & K. L. Wood (Eds.), Tools for innovation.
New York: Oxford University Press.
-
Langley, P., & Bridewell, W. (2008).
Processes and constraints in explanatory scientific discovery.
Proceedings of the Thirtieth Annual Meeting of the Cognitive
Science Society. Washingon, D.C.
-
Bridewell, W., Langley, P., Todorovski, L., & Dzeroski, S. (2008).
Inductive process modeling.
Machine Learning, 71, 1-32.
-
Borrett, S. R., Bridewell, W., Langley, P., & Arrigo, K. R. (2007).
A method for representing and developing process models.
Ecological Complexity, 4, 1-12.
-
Bridewell, W., Langley P., Racunas, S., & Borrett, S. R. (2006).
Learning process models with missing data.
Proceedings of the Seventeenth European Conference on Machine
Learning (pp. 557-565). Berlin: Springer.
-
Bridewell, W., Sanchez, J. N., Langley, P., & Billman, D. (2006).
An interactive environment for the modeling and discovery of
scientific knowledge. International Journal of Human-Computer
Studies, 64, 1099-1114.
-
Langley, P., Shiran, O., Shrager, J., Todorovski, L., & Pohorille, A.
(2006).
Constructing explanatory process models from biological data and
knowledge.
AI in Medicine, 37, 191-201.
-
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2006).
Inductive revision of quantitative process models.
Ecological Modelling, 194, 70-79.
-
Bridewell, W., Bani Asadi, N., Langley, P., & Todorovski, L. (2005).
Reducing overfitting in process model induction.
Proceedings of the Twenty-Second International Conference on
Machine Learning
(pp. 81-88). Bonn, Germany.
-
Todorovski, L., Bridewell, W., Shiran, O., & Langley, P. (2005).
Inducing hierarchical process models in dynamic domains.
Proceedings of the Twentieth National Conference on Artificial
Intelligence
(pp. 892-897). Pittsburgh, PA: AAAI Press.
-
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2004).
Computational revision of ecological process models.
Proceedings of the Fourth International Workshop on Environmental
Applications of Machine Learning (pp. 13-14). Bled, Slovenia.
-
Langley, P., Shrager, J., Asgharbeygi, N., Bay, S., & Pohorille, A.
(2004).
Inducing explanatory process models from biological time series
Proceedings of the Ninth Workshop on Intelligent Data Analysis and
Data Mining (pp. 85-90). Stanford, CA.
-
George, D., Saito, K., Langley, P., Bay, S., & Arrigo, K. (2003).
Discovering ecosystem models from time-series data.
Proceedings of the Sixth International Conference on Discovery
Science (pp. 141-152). Saporro, Japan: Springer.
-
Sanchez, J. N., & Langley, P. (2003).
An interactive environment for scientific model construction.
Proceedings of the Second International Conference on Knowledge
Capture (pp. 138-145). Sanibel Island, FL: ACM Press.
-
Langley, P., George, D., Bay, S., & Saito, K. (2003).
Robust induction of process models from time-series data.
Proceedings of the Twentieth International Conference on
Machine Learning (pp. 432-439).
-
Langley, P., Sanchez, J., Todorovski, L., & Dzeroski, S. (2002).
Inducing process models from continuous data. Proceedings of
the Nineteenth International Conference on Machine Learning
(pp. 347-354). Sydney: Morgan Kaufmann.
Papers on Equation Discovery
-
Schwabacher, M., & Langley, P. (2007).
Discovering communicable scientific knowledge from spatio-temporal data.
In S. Dzeroski & L. Todorovski (Eds.), Computational discovery of
communicable scientific knowledge. Berlin: Springer.
-
Saito, K., & Langley, P. (2007).
Quantitative revision of scientific models.
In S. Dzeroski & L. Todorovski (Eds.), Computational
discovery of communicable scientific knowledge. Berlin: Springer.
-
Todorovski, L., Dzeroski, S., Langley, P., & Potter, C. (2003).
Using equation discovery to revise an Earth ecosystem model of carbon
net production. Ecological Modelling, 170, 141-154.
-
Bay, S. D., Shapiro, D. G., & Langley, P. (2002).
Revising engineering models: Combining computational discovery with
knowledge. Proceedings of the Thirteenth European Conference
on Machine Learning (pp. 10-22). Helsinki, Finland.
-
Saito, K., Langley, P., Grenager, T., Potter, C., Torregrosa, A., &
Klooster, S. A. (2001).
Computational revision of quantitative scientific models.
Proceedings of the Fourth International Conference on Discovery
Science (pp. 336-349). Washington, D.C.: Springer.
-
Schwabacher, M., & Langley, P. (2001).
Discovering communicable scientific knowledge from spatio-temporal data.
Proceedings of the Eighteenth International Conference on Machine
Learning (pp. 489-496). Williamstown, MA: Morgan Kaufmann.
-
Nordhausen, B., & Langley, P. (1990).
A robust approach to numeric discovery.
Proceedings of the Seventh International Conference on Machine
Learning (pp. 411-418). Austin, TX: Morgan Kaufmann.
-
Langley, P., & Zytkow, J. M. (1989).
Data-driven approaches to empirical discovery.
Artificial Intelligence, 40, 283-312.
-
Langley, P., Bradshaw, G. L., & Simon, H. A. (1987).
Heuristics for empirical discovery.
In L. Bolc (Ed.), Computational models of learning.
Berlin: Springer-Verlag.
-
Langley, P., Bradshaw, G. L., & Simon, H. A. (1983).
Rediscovering chemistry with the Bacon system.
In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.),
Machine learning: An artificial intelligence approach.
San Mateo, CA: Morgan Kaufmann.
-
Langley, P., Bradshaw, G. L., & Simon, H. A. (1982).
Data-driven and expectation-driven discovery of empirical laws.
Proceedings of the Fourth Biennial Conference of the Canadian Society
for Computational Studies of Intelligence (pp. 137-143). Saskatoon,
Saskatchewan.
-
Langley, P., Bradshaw, G. L., & Simon, H. A. (1981).
Bacon.5: The discovery of conservation laws.
Proceedings of the Seventh International Joint Conference on
Artificial Intelligence (pp. 121-126). Vancouver, British Columbia:
Morgan Kaufmann.
-
Simon, H. A., Langley, P., & Bradshaw, G. L. (1981).
Scientific discovery as problem solving.
Synthese, 47, 1-27.
-
Langley, P. (1981).
Data-driven discovery of physical laws.
Cognitive Science, 5, 31-54.
-
Bradshaw, G. L., Langley, P., & Simon, H. A. (1980).
Bacon.4: The discovery of intrinsic properties.
Proceedings of the Third Biennial Conference of the Canadian Society
for Computational Studies of Intelligence (pp. 19-25). Victoria,
British Columbia.
-
Langley, P. (1979).
Rediscovering physics with Bacon.3.
Proceedings of the Sixth International Joint Conference on Artificial
Intelligence (pp. 505-507). Tokyo, Japan: Morgan Kaufmann.
-
Langley, P. (1979).
A production system model for the induction of mathematical functions.
Behavioral Science, 24, 121-139.
-
Langley, P. (1978).
Bacon.1: A general discovery system.
Proceedings of the Second Biennial Conference of the Canadian Society
for Computational Studies of Intelligence (pp. 173-180). Toronto,
Ontario.
-
Langley, P. (1977).
Bacon: A production system that discovers empirical laws.
Proceedings of the Fifth International Joint Conference on Artificial
Intelligence (pp. 344). Cambridge, MA: Morgan Kaufmann.
Papers on Qualitative Discovery
-
Bay, S. D., Shrager, J., Pohorille, A., & Langley, P. (2003).
Revising regulatory networks: From expression data to linear causal
models. Journal of Biomedical Informatics, 35, 289-297.
-
Chrisman, L., Langley, P., & Bay, S. (2003).
Incorporating biological knowledge into evaluation of causal
regulatory hypotheses.
Proceedings of the Pacific Symposium on Biocomputing (pp. 128-139).
Lihue, Hawaii.
-
Saito, K., Bay, S., & Langley, P. (2002).
Revising qualitative models of gene regulation (pp. 59-70).
Proceedings of the Fifth International Conference on Discovery
Science.
-
Shrager, J., Langley, P., & Pohorille, A. (2002).
Guiding revision of regulatory models with expression data.
Proceedings of the Pacific Symposium on Biocomputing
(pp. 486-497). Lihue, Hawaii.
-
Kocabas, S., & Langley, P. (2000).
Computer generation of process explanations in nuclear astrophysics.
International Journal of Human-Computer Studies, 53,
377-392.
-
Kocabas, S., & Langley, P. (1998).
Generating process explanations in nuclear astrophysics.
Proceedings of the ECAI-98 Workshop on Machine Discovery
(pp. 4-9). Brighton, UK.
-
Kocabas, S., & Langley, P. (1995).
Integration of research tasks for modeling discoveries in
particle physics.
Proceedings of the AAAI Spring Symposium on Systematic Methods
of Scientific Discovery (pp. 87-92). Stanford, CA: AAAI Press.
-
Rose, D. (1988).
Using domain knowledge to aid scientific theory revision.
Proceedings of the Fifth International Workshop on Machine Learning
(pp. 272-277). Ithaca, NY: Morgan Kaufmann.
-
Rose, D., & Langley, P. (1988).
A hill-climbing approach to machine discovery.
Proceedings of the Fifth International Conference on Machine Learning
(pp. 367-373). Ann Arbor, MI: Morgan Kaufmann.
-
Langley, P., & Jones, R. (1988).
A computational model of scientific insight.
In R. Sternberg (Ed.), The nature of creativity.
Cambridge University Press.
-
Jones, R., & Langley, P. (1988).
A theory of scientific problem solving.
Proceedings of the Tenth Conference of the Cognitive Science Society
(pp. 244-250). Montreal, Quebec: Lawrence Erlbaum.
-
Rose, D., & Langley, P. (1987).
Belief revision and induction.
Proceedings of the Ninth Conference of the Cognitive Science Society
(pp. 748-752). Seattle, WA: Lawrence Erlbaum.
-
Zytkow, J. M., Langley, P., & Simon, H. A. (1987).
Computer system of discovery Stahl.
Studia Filozoficzne or Zagadnienia Naukoznawstwa, 23, 518-536.
-
Rose, D., & Langley, P. (1986).
Chemical discovery as belief revision.
Machine Learning, 1, 423-451.
-
Zytkow, J. M., & Simon, H. A. (1986).
A theory of historical discovery: The construction of componential models.
Machine Learning, 1, 107-137.
-
Rose, D., & Langley, P. (1986).
Stahlp: Belief revision in scientific discovery.
Proceedings of the Fifth National Conference of the American
Association for Artificial Intelligence} (pp. 528-532).
Philadelphia, PA: Morgan Kaufmann.
-
Jones, R. (1986).
Generating predictions to aid the scientific discovery process.
Proceedings of the Fifth National Conference of the American
Association for Artificial Intelligence} (pp. 513-517).
Philadelphia, PA: Morgan Kaufmann.
-
Nordhausen, B. (1986).
Conceptual clustering using relational information.
Proceedings of the Fifth National Conference of the American
Association for Artificial Intelligence} (pp. 508-512).
Philadelphia, PA: Morgan Kaufmann.
-
Langley, P., Simon, H. A., Zytkow, J. M., & Fisher, D. H. (1985).
Discovering qualitative empirical laws
(Technical Report No. 85-18).
Irvine: University of California, Department of Information &
Computer Science.
-
Zytkow, J., Langley, P., & Simon, H. A. (1984).
A model of early chemical reasoning.
Proceedings of the Sixth Conference of the Cognitive Science Society
(pp. 378-381). Boulder, CO: Lawrence Erlbaum.
-
Langley, P., Bradshaw, G. L., Zytkow, J., & Simon, H. A. (1983).
Three facets of scientific discovery.
Proceedings of the Eighth International Joint Conference on Artificial
Intelligence (pp. 465-468). Karlsruhe, West Germany: Morgan Kaufmann.
Papers on Integrated Discovery
-
Langley, P. (in press).
Integrated systems for computational scientific discovery.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial
Intelligence. Vancouver, BC: AAAI Press.
-
Kocabas, S., & Langley, P. (2001).
An integrated framework for extended discovery in particle physics.
Proceedings of the Fourth International Conference on Discovery
Science (pp. 182-195). Washington, D.C.: Springer.
-
Nordhausen, B., & Langley, P. (1993).
An integrated framework for empirical discovery.
Machine Learning, 12, 17-47.
-
Nordhausen, B., & Langley, P. (1990).
An integrated approach to empirical discovery.
In J. Shrager & P. Langley (Eds.),
Computational models of scientific discovery and theory formation.
San Mateo, CA: Morgan Kaufmann.
-
Nordhausen, B., & Langley, P. (1987).
Towards an integrated discovery system.
Proceedings of the Tenth International Joint Conference on Artificial
Intelligence (pp. 198-200). Milan, Italy: Morgan Kaufmann.
-
Langley, P., & Nordhausen, B. (1986).
A framework for empirical discovery.
Proceedings of the International Meeting on Advances in Learning.
Les Arc, France.
Generic Publications on Scientific Discovery
-
Langley, P. (2021).
Agents of exploration and discovery.
AI Magazine, 42, 72-82.
-
Dzeroski, S., Langley, P., & Todorovski, L. (2007).
Computational discovery of scientific knowledge.
In S. Dzeroski & L. Todorovski (Eds.), Computational discovery of
communicable scientific knowledge. Berlin: Springer.
-
Schwabacher, M., & Langley, P. (2007).
Discovering communicable scientific knowledge from spatio-temporal data.
In S. Dzeroski & L. Todorovski (Eds.), Computational discovery of
communicable scientific knowledge. Berlin: Springer.
-
Langley, P. (2002).
Lessons for the computational discovery of scientific knowledge.
Proceedings of First International Workshop on Data Mining Lessons
Learned (pp. 9-12). Sydney.
-
Langley, P., Shrager, J., & Saito, K. (2002).
Computational discovery of communicable scientific knowledge.
In L. Magnani, N. J. Nersessian, & C. Pizzi (Eds.),
Logical and Computational Aspects of Model-Based Reasoning.
Dordrecht: Kluwer Academic.
-
Dzeroski, S., & Langley, P. (2001).
Computational discovery of communicable knowledge: Symposium report.
Proceedings of the Fourth International Conference on Discovery
Science (pp. 45-49). Washington, D.C.: Springer.
-
Langley, P., Magnani, L., Cheng, P. C.-H., Gordon, A., Kocabas, S., &
Sleeman, D. H. (2001).
Computational models of historical scientific discoveries.
Proceedings of the Twenty-Third Annual Conference of the Cognitive
Science Society (p. 3). Edinburgh: Lawrence Erlbaum.
-
Langley, P. (2000).
The computational support of scientific discovery.
International Journal of Human-Computer Studies, 53,
1149-1164.
-
Langley, P. (1999).
The computer-aided discovery of scientific knowledge .
Proceedings of the First International Conference on Discovery
Science. Fukuoka, Japan:
Springer.
-
Langley, P. (1995).
Stages in the process of scientific discovery.
Proceedings of the AAAI Spring Symposium on Systematic Methods
for Scientific Discovery (p. 93). Stanford, CA: AAAI Press.
-
Shrager, J., & Langley, P. (Eds.) (1990). Computational models of
scientific discovery and theory formation. San Mateo, CA: Morgan
Kaufmann.
-
Shrager, J., & Langley, P. (1990).
Computational approaches to scientific discovery.
In J. Shrager & P. Langley (Eds.),
Computational models of scientific discovery and theory formation.
San Mateo, CA: Morgan Kaufmann.
-
Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987).
Scientific discovery: Computational explorations of the creative
processes. Cambridge, MA: MIT Press.
-
Langley, P., Zytkow, J., Simon, H. A., & Bradshaw, G. L. (1986).
The search for regularity: Four aspects of scientific discovery.
In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.),
Machine learning: An artificial intelligence approach (Vol. 2).
San Mateo, CA: Morgan Kaufmann.
-
Bradshaw, G. L., Langley, P., & Simon, H. A. (1983).
Studying scientific discovery by computer simulation.
Science, 222, 971-975.
For more information, send electronic mail to
langley@isle.org