Gaussian process emulation for exploring complex infectious disease models
Anna M. Langmüller,Kiran A. Chandrasekher, Benjamin C. Haller, Samuel E. Champer, Courtney C. Murdock, Philipp W. Messer
Abstract
Epidemiological models that aim for a high degree of biological realism by simulating every individual in a population are unavoidably complex, with many free parameters, which makes systematic explorations of their dynamics computationally challenging. In this study, we demonstrate how Gaussian Process emulation can overcome this challenge. To simulate disease dynamics, we developed an abstract individual-based model that is loosely inspired by dengue, incorporating some key features shaping dengue epidemics such as social structure, human movement, and seasonality.
Introduction
Simulation models that describe individual organisms — often referred to as individual-based or agent-based models — have become well-established research tools across numerous scientific disciplines [1]. In the field of epidemiology, such models have provided valuable insights into the dynamics of pathogen and disease spread and have facilitated rigorous evaluation of planned intervention strategies, making them an integral part of modern epidemiological research [2–6].
Methods
Individual-based model
We implemented an individual-based model (IBM; in epidemiology, the terms “individual-based” and “agent-based” are used largely interchangeably to describe models that simulate individual entities and their interactions [1]; following this convention, we will use “individual-based model” as a general term for such modeling approaches) in C++ that simulates and tracks disease transmission and includes several parameters related to infection probability, human movement, and social structure — three key features shaping dengue epidemics [38–40].
Results
Gaussian process performance
To enable a more efficient exploration of the output space of our IBM, we trained GP surrogate models on input-output data pairs from the IBM. Specifically, we trained three independent GPs to predict outbreak probability, imax, and outbreak duration.
Discussion
In this paper, we demonstrated the potential of statistical emulation for studying the dynamics of epidemiological IBMs. Specifically, we implemented an abstract individual-based disease transmission model, loosely inspired by dengue, in C++ and trained Gaussian Process (GP) emulators to approximate three key outbreak metrics: outbreak probability, maximum incidence (imax), and outbreak duration.
Acknowledgments
We thank all members of the Messer and Murdock lab for helpful discussions. Special thanks to Beliz Erdogmus for her contributions during the early phases of the project; Isabel Kim, Mitchell Lokey, and Meera Chotai for technical support; Amir Siraj for support with the municipality-specific environmental data; and Oliver Brady for providing supplemental shape files.
Citation: Langmüller AM, Chandrasekher KA, Haller BC, Champer SE, Murdock CC, Messer PW (2025) Gaussian process emulation for exploring complex infectious disease models. PLoS Comput Biol 21(12): e1013849. https://doi.org/10.1371/journal.pcbi.1013849
Editor: Jennifer A. Flegg, The University of Melbourne Faculty of Science, AUSTRALIA
Received: June 11, 2025; Accepted: December 18, 2025; Published: December 29, 2025
Copyright: © 2025 Langmüller et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The source code for the individual-based disease transmission model implemented in C++ is available on GitHub at https://github.com/AnnaMariaL/DengueSim. Simulated data, pre-trained Gaussian process models, Jupyter notebooks demonstrating their use, and all data and code required to reproduce the results and figures presented in this study are available on GitHub at https://github.com/AnnaMariaL/DengueSim-GP.
Funding: This project & AML have received funding from the European Union’s Horizon2020 (https://research-and-innovation.ec.europa.eu/funding/funding-opportunities/funding-programmes-and-open-calls/horizon-2020_en) research and innovation program under the Marie Sklodowska-Curie grant agreement No. 101025586. PWM was supported by the National Institutes of Health (https://grants.nih.gov/funding/activity-codes/r35) under award R35GM152242. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.





