A Quantitative Genetic Model of Background Selection in Humans
Vince Buffalo, Andrew D. Kern
Abstract
Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This “linked selection signal” reflects the impact of selection according to the physical placement of functional regions and recombination rates along chromosomes. Previous work has shown that purifying selection acting against the steady influx of new deleterious mutations at functional portions of the genome shapes patterns of genomic variation. To date, statistical efforts to estimate purifying selection parameters from linked selection models have relied on classic Background Selection theory, which is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of linked selection, that models how polygenic additive fitness variance distributed along the genome increases the rate of stochastic allele frequency change. By jointly predicting the equilibrium fitness variance and substitution rate due to both strong and weakly deleterious mutations, we estimate the distribution of fitness effects (DFE) and mutation rate across three geographically distinct human samples.
Introduction
The continual influx of new mutations into populations is the ultimate source of all adaptations, but the vast majority of mutations either do not affect fitness or are deleterious. Natural selection works to eliminate these deleterious mutations from the population, thus we expect them to appear at low frequencies within populations [1], and be less likely to fix between lineages. Conserved genomic regions reflect the product of hundreds of millions of years of evolutionary optimization; thus the overwhelming majority of segregating variation in these regions will have deleterious fitness effects. Consequently, a good predictor of whether a new mutation will reduce fitness is if it occurs in a region of the genome that has been conserved over phylogenetic timescales [2, 3]. Moreover, segregating rare variation in these regions is responsible for a significant proportion of the genetic contribution to phenotypic variation and disease in humans [4–7].
Methods
Solving the B’ equations for each segment
Our software bgspy [84] first calculates the equilibrium additive genetic fitness variation and deleterious substitution rate for each user-specified segments in the genome. These equilibria are calculated across grids of mutation rate weighted by the DFE mi and selection coefficient sj, by numerically solving the following system of equations,
Results
We provide two main classes of results. First, we show simulation results which demonstrate the accuracy of the SC16 model over the BGS approximation across the parameter space, as well as validations of the composite likelihood strategy we use to fit the SC16 model. Second, we provide fits of our method to human genome data, where we show comparison of models fit using different annotations, the estimated DFEs, and predictions of the deleterious substitution rate.
Discussion
New mutations at functionally important regions of the genome are a major source of fitness variation in natural populations, as the vast majority of such mutations are deleterious. Purifying selection, working to remove these deleterious variants, perturbs genealogies at linked sites, creating large-scale patterns in genomic diversity. While this has been recognized for decades [8, 17], the availability of genomic data allows for methods to estimate the degree to which purifying selection shapes genomic variation and at what scale.
Acknowledgments
We would like to thank Doc Edge, Ben Good, Taylor Kessinger, Graham McVicker, Priya Moorjani, David Murphy, Rasmus Nielsen, Guy Sella, Joshua Schraiber, and Peter Sudmant for helpful discussions, and Martin Kircher for providing modified CADD tracks. We thank Brian Charlesworth, Graham Coop, Matt Hahn, Nate Pope, Enrique Santiago for comments on the manuscript.
Citation: Buffalo V, Kern AD (2024) A quantitative genetic model of background selection in humans. PLoS Genet 20(3): e1011144. https://doi.org/10.1371/journal.pgen.1011144
Editor: Bret Payseur, University of Wisconsin–Madison, UNITED STATES
Received: September 17, 2023; Accepted: January 19, 2024; Published: March 20, 2024
Copyright: © 2024 Buffalo, Kern. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code from bgspy and our Jupyter Lab (Kluyver et al. n.d.) notebooks for analysis are available on GitHub (https://github.com/vsbuffalo/bprime). The main model fits are available as Python Pickle objects on Data Dryad repository (https://doi.org/10.5061/dryad.qnk98sfnv).
Funding: This research was supported by National Institute of Health awards R35GM148253 and R01HG010774 to ADK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1011144&rev=2#abstract0