Computational Induction of Scientific Process Models
This project aims to develop a framework that unifies two separate but
central themes in information technology -- computational simulation
of models to explain important phenomena and computational induction
of knowledge from observed regularities in data. Unlike most previous
work in machine learning and data mining, the approach emphasizes
methods that generate knowledge in established scientific formalisms,
incorporate domain knowledge where possible, focus on causal and
explanatory models, address induction from observational time-series
data, and are embedded in a simulation environment which scientists
can use for model development.
Our approach revolves around a new class of models that consist of
interacting quantitative processes and the problem of inducing such
models from time-series data. Computational challenges that we will
address include reducing overfitting and variance, inducing conditions
on processes, handling large, heterogeneous data sets with missing
values, and scaling to complex models. We will incorporate the resulting
algorithms in a trainable simulation environment that lets users
construct models manually or induce them from data, then simulate
their behavior. Experimental evaluation will involve both Earth
Science observations from the Ross Sea and synthetic data.
The trainable simulation environment will let Earth scientists search
the space of candidate models systematically, producing more accurate
models in much less time. Moreover, the novel computational methods
should aid model construction in other fields like systems biology and
engineering. Both the environment and sample models will be utilized
in courses and accessible at future incarnations of this Web site.
Our early work on this effort was funded by NTT Communication Science
Laboratories, Nippon Telegraph and Telephone Corporation, whereas
current support comes through Grant
IIS-0326059
from the National Science Foundation. Researchers currently involved in the
effort include
Kevin Arrigo,
Will Bridewell,
Christine Desmarais,
Pat Langley,
Chunki Park,
and
Bernard Widrow.
Past contributers to the project include
Nima Asgharbeygi,
Narges Bani Asadi,
Tahir Azim,
Dorrit Billman,
Stuart Borrett,
Matthew D. Bravo,
Jed Crosby,
Yi Ding,
Matthew Janes,
Danny Korenblum,
Fabian Lischka,
Stephen Racunas,
Nikhil Raghavan,
Tamar Shinar,
Oren Shiran, and
Jeff Shrager.
In addition, the ISLE/Stanford team collaborates with
Saso Dzeroski
and
Ljupco Todorovski in the
Department of Intelligent Systems at the Jozef Stefan Institute
in Ljubljana, Slovenia.
Inductive Process Modeling Software
Find out more about the
Prometheus modeling environment and download an initial version.
Related Publications
-
Bridewell, W., Borrett, S. R., & Langley, P. (2009).
Supporting innovative construction of explanatory scientific models.
In A. B. Markman & K. L. Wood (Eds.), Tools for Innovation.
Oxford: Oxford University Press.
-
Bridewell, W., & Langley, P. (2009).
Two kinds of knowledge in scientific discovery.
Topics in Cognitive Science, 1, early access.
-
Langley, P., & Bridewell, W. (2008).
Processes and constraints in explanatory scientific discovery.
Proceedings of the Thirtieth Annual Meeting of the Cognitive
Science Society. Washingon, D.C.
-
Bridewell, W., Langley, P., Todorovski, L., & Dzeroski, S. (2008).
Inductive process modeling.
Machine Learning, 71, 1-32.
-
Bridewell, W., Borrett, S., & Todorovski, L. (2007).
Extracting constraints for process modeling.
Proceedings of the Fourth International Conference on Knowledge
Capture (pp. 87-94). Whistler, BC.
-
Bridewell, W., & Todorovski, L. (2007).
Learning declarative bias.
Proceedings of the Seventeenth International Conference on Inductive
Logic Programming. Corvallis, OR.
-
Borrett, S. R., Bridewell, W., Langley, P., & Arrigo, K. R. (2007).
A method for representing and developing process models.
Ecological Complexity, 4, 1-12.
-
Bridewell, W., Sanchez, J. N., Langley, P., & Billman, D. (2006).
An interactive environment for the modeling and discovery of
scientific knowledge. International Journal of Human-Computer
Studies, 64, 1099-1114.
-
Bridewell, W., Langley P., Racunas, S., & Borrett, S. R. (2006).
Learning process models with missing data. Proceedings of the
Seventeenth European Conference on Machine Learning (pp. 557-565).
Berlin: Springer.
-
Langley, P., Shiran, O., Shrager, J., Todorovski, L., & Pohorille, A.
(2006).
Constructing explanatory process models from biological data and
knowledge.
AI in Medicine, 37, 191-201.
-
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2006).
Inductive revision of quantitative process models.
Ecological Modelling, 194, 70-79.
-
Bridewell, W., Bani Asadi, N., Langley, P., & Todorovski, L. (2005).
Reducing overfitting in process model induction.
Proceedings of the Twenty-Second International Conference on
Machine Learning
(pp. 81-88). Bonn, Germany.
-
Todorovski, L., Bridewell, W., Shiran, O., & Langley, P. (2005).
Inducing hierarchical process models in dynamic domains.
Proceedings of the Twentieth National Conference on Artificial
Intelligence
(pp. 892-897). Pittsburgh, PA: AAAI Press.
-
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2004).
Computational revision of ecological process models.
Proceedings of the Fourth International Workshop on Environmental
Applications of Machine Learning (pp. 13-14). Bled, Slovenia.
-
Langley, P., Shrager, J., Asgharbeygi, N., Bay, S., & Pohorille, A.
(2004).
Inducing explanatory process models from biological time series
Proceedings of the Ninth Workshop on Intelligent Data Analysis and
Data Mining (pp. 85-90). Stanford, CA.
-
Lavrac, N., Motoda, H., Fawcett, T., Holte, R., Langley, P., &
Adriaans, P. (2004).
Lessons learned from data mining applications and collaborative
problem solving.
Machine Learning, 57, 13-34.
-
George, D., Saito, K., Langley, P., Bay, S., & Arrigo, K. (2003).
Discovering ecosystem models from time-series data.
Proceedings of the Sixth International Conference on Discovery
Science (pp. 141-152). Saporro, Japan: Springer.
-
Sanchez, J. N., & Langley, P. (2003).
An interactive environment for scientific model construction.
Proceedings of the Second International Conference on Knowledge
Capture (pp. 138-145). Sanibel Island, FL: ACM Press.
-
Langley, P., George, D., Bay, S., & Saito, K. (2003).
Robust induction of process models from time-series data.
Proceedings of the Twentieth International Conference on
Machine Learning (pp. 432-439).
-
Langley, P., Sanchez, J., Todorovski, L., & Dzeroski, S. (2002).
Inducing process models from continuous data. Proceedings of
the Nineteenth International Conference on Machine Learning
(pp. 347-354). Sydney: Morgan Kaufmann.
For more information, send electronic mail to
langley@isle.org