Probabilistic classification of gene-by-treatment interactions on molecular count phenotypes
Yuriko Harigaya, Nana Matoba, Brandon D Le, Jordan M Valone, Jason L Stein, Michael I Love, William Valdar
Abstract
Genetic variation can modulate response to treatment (G×T) or environmental stimuli (G×E), both of which can be highly consequential in biomedicine. An effective approach to identifying G×T signals and gaining insight into molecular mechanisms is mapping quantitative trait loci (QTL) of molecular count phenotypes, such as gene expression and chromatin accessibility, under multiple treatment conditions, which is termed response molecular QTL mapping. Although standard approaches evaluate the interaction between genetics and treatment conditions, they do not distinguish between meaningful interpretations such as whether a genetic effect is observed only in the treated condition or whether a genetic effect is observed always but accentuated in the treated condition.
Introduction
Gene-by-treatment (G×T ) interactions describe associations between genotype and phenotype that are modulated by treatment or, equivalently, phenotypic responses to treatment that are modulated by genotype. Understanding G×T interactions is crucial for interpreting disease-associated genetic variants and, eventually, for clinical decision-making. These phenomena may also be called gene-by-environment (G×E) interactions in a broade r context [2].
Materials and method
Datasets and preprocessing
Primary human neural progenitor cell (hNPC) genotypes, RNA-seq, and ATAC-seq data were obtained from a previous study [11] and were based on the GRCh38 genome assembly. The genotype data was coded as {0,1,2} to represent the number of minor (alternative) alleles and contained 78 and 72 donors for the RNA- and ATAC-seq data, respectively. Among the previously generated data with multiple treatments, we focused on the vehicle and CHIR treatments as control and treated conditions, respectively. The dataset contained one sample per combination of donor and treatment condition. That is, the donors were shared in both control and treated conditions, and there was no missingness.
Results
A Bayesian model selection (BMS) framework for classifyingG×Tinteractions with molecular count phenotypes
Overview of the framework.
We focus on the effects of a treatment on the association between a genotype and molecular phenotype, such as gene expression and chromatin accessibility measured by RNA-sequencing (RNA-seq) [21] and assay of transposase-accessible chromatin sequencing (ATAC-seq) [22] techniques. In what follows, we refer to the unit being measured for a given molecular phenotype as a feature: for gene expression, the feature is a gene; for chromatin accessibility, the feature is an accessible chromatin region, often called a candidate cis-regulatory element (cCRE).
Discussion
We have developed a method to help interpret and prioritize response molecular QTLs (i.e., pre-selected feature-SNP pairs with significant G×T interactions). The method uses BMS to assign posterior probability to different types of G×T interactions. Within this framework, we compared three different modeling approaches, log -NL, log -LM, and log -RINT. The first approach, log -NL, assumes that molecular signals are additive with respect to allelic counts and explicitly models the nonlinear relationship between the genotype and the molecular counts after the log transformation. The log -NL approach is justified based on previous studies as well as our analysis examining the adequacy of the model using experimental data [17–19], whereas the other approaches are commonly used.
Acknowledgments
We thank Samir Kelada and Yun Li of the University of North Carolina at Chapel Hill for helpful discussion and suggestions on this work.
Citation: Harigaya Y, Matoba N, Le BD, Valone JM, Stein JL, Love MI, et al. (2025) Probabilistic classification of gene-by-treatment interactions on molecular count phenotypes. PLoS Genet 21(4): e1011561. https://doi.org/10.1371/journal.pgen.1011561
Editor: Jingjing Yang, Emory University School of Medicine, UNITED STATES OFAMERICA
Received: August 8, 2024; Accepted: December 31, 2024; Published: April 9, 2025
Copyright: © 2025 Harigaya et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code to generate the results in this paper can be found at the GitHub repository: https://github.com/yharigaya/classifygxt-paper and doi: 10.5281/zenodo.14210108. The experimental data used in this study has been published in Matoba et al. (2024) (doi: 10.1038/s41593-024-01773-6), with code used for that paper available on BitBucket: https://bitbucket.org/steinlabunc/wnt-rqtls/. The data can be accessed via the Database of Genotypes and Phenotypes (dbGap) at https://www.ncbi.nlm.nih.gov/gap/ with the accession number phs003642.v1.p1. Numerical data that underlies graphs or summary statistics can be found at https://zenodo.org/records/14829509.
Funding: YH, NM, BDL, JMV, JSL, and MIL were supported by NIH National Institute of Mental Health R01 MH118349. YH and WV were supported by NIH National Institute of General Medical Sciences R35 GM127000. Additional support to NM, BDL, JMV, and JSL was from NIH National Institute of Mental Health R01 MH120125 and R01 MH121433. JMV and BDL were also supported in part by NIH T32 training grants (T32GM135123 and T32GM067553, respectively). The URL for the NIH funders is https://www.nih.gov/institutes-nih/list-institutes-centers. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.