Colorcon || One Partner
Survey Banner

Evolutionary turnover of key amino acids explains conservation of function without conservation of sequence in transcriptional activation domains

Claire J. LeBlanc, Jordan Stefani, Melvin Soriano, Angelica W. Y. Lam, Marissa A. Zintel, Sanjana R. Kotha, Emily P. Chase, Giovani Pimentel-Solorio, Aditya Vunnum, Gean Hu, Katherine L. Flug, Aaron Fultineer, Niklas Hummel, Max V. Staller

Abstract

In folded protein domains, protein function is frequently more conserved than amino acid sequence because highly diverged sequences can fold into equivalent 3D structures with identical function. During evolution, intrinsically disordered protein regions (IDRs) often experience rapid amino acid sequence divergence, but because they do not fold into stable 3D structures, it remains largely unknown when and how function is conserved. 

Introduction

The evolution of eukaryotic transcription factors (TF) contains a paradox: TF protein sequences diverge quickly but maintain function over long evolutionary distances. For example, the master regulator of eye development in mice, Pax6, induces ectopic eyes in flies, and fly Pax6 (eyeless) creates ectopic eye structures in frogs and mice [1–3]. While the DNA-binding domains (DBD) are 96% identical, eye induction requires the intrinsically disordered regions (IDRs), which are only 35.5% identical. 

Materials and method

Identification of homologous sequences

We computationally screened for Gcn4 homologs of S. cerevisiae. We started with a hand-collected set of forty-nine homologs, forty-eight of which contained the WxxLF motif [15, 45]. To find new homologs, we used two criteria: the bZIP DNA binding domain (IPR004827) and the regular expression Wx[SPA]LF for the WxxLF motif. These criteria distinguished Gcn4 homologs from other leucine zipper DNA binding domain TFs.

Results

Identification of Gcn4 homologs

As a model system to study TF evolution, we used Gcn4, a yeast stress response TF. Gcn4 contains an intrinsically disordered region followed by a C-terminal DBD. We identified diverse homologs, quantified sequence divergence, experimentally mapped activation domains with a tiling strategy, and looked for conservation of activation domain function without conservation of sequence.

Discussion

By functionally screening protein fragments from a family of homologous TFs, we demonstrate how conservation of activation function in TF IDRs without conservation of sequence arises from neutral drift, stabilizing selection, evolutionary turnover of complete activation domains, and evolutionary turnover of key acidic and F residues within activation domains. In some cases, gain and loss of upstream activation domains appears to result from turnover of individual residues.

Acknowledgments

We would like to thank Nick Ingolia, Zeba Wunderlich, Rachel Brem, Alex Holehouse, Andrew Murray, Shahar Sukenik, Michael Botchen, and Ashley Wolf for helpful comments on the manuscript.

Citation: LeBlanc CJ, Stefani J, Soriano M, Lam AWY, Zintel MA, Kotha SR, et al. (2026) Evolutionary turnover of key amino acids explains conservation of function without conservation of sequence in transcriptional activation domains. PLoS Genet 22(3): e1012069. https://doi.org/10.1371/journal.pgen.1012069

Editor: Michael J. Guertin, UConn Health Center: UConn Health, UNITED STATES OF AMERICA

Received: November 14, 2025; Accepted: February 24, 2026; Published: March 16, 2026

Copyright: © 2026 LeBlanc et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All the raw sequencing data has been deposited at NIH SRA Accession #PRJNA1186961: http://www.ncbi.nlm.nih.gov/bioproject/1186961 All the analysis scripts are deposited on github and Zenodo: 10.5281/zenodo.14201918 https://github.com/staller-lab/Gcn4-evolution https://github.com/staller-lab/labtools/tree/main/src/labtools/adtools All the processed data are attached in supplemental tables (S6, S11, and S12 Tables). Processed sequencing read counts are in S13 Table.

Funding: CJL was funded by T32HG4725, MAZ was funded by T32GM148378, and AF was funded by T32GM146614. AL was funded by UC Berkeley URAP. MS and SRK were funded by UC Berkeley SEED Scholars Program. SRK was also funded by UC Berkeley SURF. GH was funded by UC Berkeley Rose Hill. GPS was funded by the UC Berkeley BSP scholars, the McNair Scholars, and UC Berkeley SURF programs. This work was supported by the Burroughs Wellcome Fund PDEP, Simons Foundation grant 1018719, NSF grant 2112057, and NIH grant R35GM150813 awarded to MVS. MVS is a Biohub, San Francisco Investigator. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.