A novel expectation-maximization approach to infer general diploid selection from time-series genetic data
Adam G. Fine, Matthias Steinrücken
Abstract
Detecting and quantifying the strength of selection is a major objective in population genetics. Since selection acts over multiple generations, many approaches have been developed to detect and quantify selection using genetic data sampled at multiple points in time. Such time-series genetic data is commonly analyzed using Hidden Markov Models, but in most cases, under the assumption of additive selection. However, many examples of genetic variation exhibiting non-additive mechanisms exist, making it critical to develop methods that can characterize selection in more general scenarios.
Introduction
Genetic variation that confers a fitness advantage to an organism over its peers tends to increase in frequency in the population over time until eventual fixation, if it is not lost to genetic drift. This stochastic process of selection ultimately forms the basis of adaptation. Detecting evidence of selection and quantifying its strength is thus a fundamental problem in evolutionary biology, with applications ranging from finding mutations critical to early hominid evolution [1] to predicting tumor growth [2]. In population genetics, many methods to detect signatures of past selective events in contemporary population genomic data have thus been developed [3–5].
Materials and method
Parameterizing general diploid selection
Throughout this article, we consider selection acting at a given biallelic locus in a diploid population of constant size Ne. The dynamics of the population allele frequency at this locus can be described by the discrete Wright–Fisher model, where we denote by A and a the two alleles at the locus, and by and 1−Yt the random population-level frequency of the A and a allele in generation , respectively.
Results
Simulation study
To assess the performance of our EM-HMM algorithm to estimate selection coefficients and infer the correct mode of selection across a variety of selection regimes, we simulated datasets under several combinations of parameters and exhibit the accuracy of the inference.
Discussion
In this work, we presented a novel method to compute maximum likelihood estimates (MLEs) for general diploid selection coefficients from time-series genetic data. To this end, we extended the framework in [16] for the additive case and derived an EM-HMM algorithm to estimate the parameters of diploid selection. We show that the diploid EM-HMM framework can also be constrained to bespoke one-parameter models of selection via the method of Lagrange multipliers.
Acknowledgments
We want to thank Xinyi Li, Xiaoheng Cheng, Constanza de la Fuente, and Maanasa Raghavan for helpful comments on the method and the data analysis. Moreover, we thank Jeremy Berg, Maryn Carlson and Rowan Hart for comments on the manuscript. In addition, we thank the members of the Raghavan, Berg, and Novembre labs for valuable feedback throughout the project.
Citation: Fine AG, Steinrücken M (2025) A novel expectation-maximization approach to infer general diploid selection from time-series genetic data. PLoS Genet 21(7): e1011769. https://doi.org/10.1371/journal.pgen.1011769
Editor: Parul Johri, Arizona State University - Tempe Campus: Arizona State University, UNITED STATES OF AMERICA
Received: December 13, 2024; Accepted: June 11, 2025; Published: July 22, 2025
Copyright: © 2025 Fine, Steinrücken. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code to generate the results is available at https://github.com/steinrue/EMSel. The ancient DNA data for the analysis was downloaded from the Allen Ancient DNA Resource (AADR) at https://doi.org/10.7910/DVN/FFIDCW (Version 7.0).
Funding: AGF was supported by the Department of Education, Graduate Assistance in Areas of National Need (GAANN), Grant #P200A210054. AGF and MS were supported by the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health under award R01GM146051 to MS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.