Assessing variant effect predictors and disease mechanisms in intrinsically disordered proteins
Mohamed Fawzy, Joseph A. Marsh
Abstract
Intrinsically disordered regions (IDRs) are central to diverse cellular processes but present unique challenges for interpreting genetic variants implicated in human disease. Unlike structured protein domains, IDRs lack stable three-dimensional conformations and are often involved in regulation through transient interactions and post-translational modifications. These features can affect both the distribution of pathogenic variants and the performance of computational tools used to predict their effects.
Introduction
Intrinsically disordered regions (IDRs) of proteins lack stable secondary or tertiary structure under physiological conditions, instead adopting flexible conformations that can shift in response to binding partners or cellular cues [1–3]. This structural plasticity allows IDRs to act as molecular hubs in diverse cellular processes such as transcriptional regulation, signal transduction, and protein–protein interactions [4].
Materials and method
Structural classification of human residues
Every residue in the human proteome, considering the primary UniProt isoform of each protein-coding gene was given a structural classification of ordered, intermediate or disordered. To do this, we utilised the pLDDT derived from AF2 [16]. pLDDT inversely correlates very well with the flexibility of protein structures such that AF2 assigns low pLDDT scores for regions that are highly flexible and lack fixed 3D structures such as IDRs and linkers between ordered regions [17,18].
Results and discussion
Pathogenic missense variants are depleted in disordered regions
To define intrinsically disordered regions, we used AlphaFold2 (AF2) pLDDT scores, which correlate inversely with structural order [16–18]. AF2 pLDDT is recognised as a robust predictor of disorder [17]. While some studies suggest that pLDDT may approach or even exceed the performance of traditional disorder prediction tools in certain contexts [19,20], benchmarking efforts such as CAID-2 [21] have shown that no single method consistently outperforms others across all benchmarks. Thus, we use pLDDT here primarily due to its consistent proteome-wide availability and demonstrated utility for defining long disordered regions.
Acknowledgments
We thank Benjamin Livesey and Mihaly Badonyi for helpful comments on the manuscript. This work has made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF) (http://www.ecdf.ed.ac.uk/).
Citation: Fawzy M, Marsh JA (2025) Assessing variant effect predictors and disease mechanisms in intrinsically disordered proteins. PLoS Comput Biol 21(8): e1013400. https://doi.org/10.1371/journal.pcbi.1013400
Editor: Nir Ben-Tal, Tel Aviv University, ISRAEL
Received: April 1, 2025; Accepted: August 6, 2025; Published: August 19, 2025
Copyright: © 2025 Fawzy, Marsh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Associated datasets are available at https://doi.org/10.6084/m9.figshare.c.7747895.v1 and the pipeline code is shared at https://github.com/drsamibioinfo/VEPS_IN_DISORDER/.
Funding: This project was supported by funding to JAM from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 101001169) and by the Medical Research Council (MRC) Human Genetics Unit core grant (MC_UU_00035/9). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.