Sino Biological - ProPure™ Endotoxin-Free Proteins
Lonza || Harvest 40 years of primary cell expertise

Enhancing nonlinear transcriptome- and proteome-wide association studies via trait imputation with applications to Alzheimer’s disease

Ruoyu He, Jingchen Ren, Mykhaylo M. Malakhov, Wei Pan

Abstract

Genome-wide association studies (GWAS) performed on large cohort and biobank datasets have identified many genetic loci associated with Alzheimer’s disease (AD). However, the younger demographic of biobank participants relative to the typical age of late-onset AD has resulted in an insufficient number of AD cases, limiting the statistical power of GWAS and any downstream analyses. To mitigate this limitation, several trait imputation methods have been proposed to impute the expected future AD status of individuals who may not have yet developed the disease.

Introduction

Alzheimer’s disease (AD), a complex polygenic neurodegenerative disorder and the most prevalent form of dementia, has captured significant attention within the genetics research community. Genetic mutations are recognized as a predominant factor in AD pathogenesis, with heritability estimates for late-onset AD ranging from 60% to 80% [1]. The advent of large biobanks and the development of genome-wide association studies (GWAS) have accelerated the identification of single-nucleotide polymorphisms (SNPs) and genetic loci linked to AD. In the past decade alone, the number of known GWAS loci significantly associated with AD has increased from 20 to over 90 [2–7]. Although these findings have significantly advanced our understanding of the genetic architecture of AD, statistical power for studying neurodegenerative conditions may plateau despite ever-larger sample sizes due to the demographic makeup of most large biobanks. 

Materials and method

2.1. Overview of TWAS/PWAS

We begin by outlining the TWAS/PWAS framework. For conciseness, we will use TWAS as an example, though the same methods are also applicable to proteomics and other types of molecular traits. The causal model for TWAS is illustrated in Fig 1a. Formally, denote   to be the outcome trait with n samples,   to be a gene’s or protein’s expression levels, and to be the genotype matrix with m genetic variants. The TWAS model is as follows,

Results

In the following sections, we first evaluate the performance of each AD imputation method on GWAS tasks to justify our selection of imputation methods for TWAS/PWAS. We then detail our TWAS analysis workflow for HDL cholesterol and the results we obtained. Note that the target application of our study is AD, for which the number of UKB individuals with available diagnoses is very low.

Discussion

The results of this study underscore the potential of proxy phenotypes and LS-imputation to address the challenges posed by the small number of AD cases in large biobank data, which otherwise limits the discovery of genetic signals related to AD. By leveraging the proxy AD and LS-imputation methods, we observed consistent and reliable GWAS results across different datasets. Specifically, the comparison of GWAS summary statistics derived from imputed AD traits using either proxy AD or LS-imputation with those from other popular methods highlighted the robustness of these two approaches in capturing the genetic underpinnings of AD.

Acknowledgments

Access to the GTEx data was approved for dbGaP Project #26511, and the data were obtained from dbGaP accession number phs000424.v8.p2 on 10/13/2021. Access to the UK Biobank (UKB) data was approved through UKB Application #35107. Data from the Alzheimer’s Disease Sequencing Project (ADSP) were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania. The authors also acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for providing high-performance computing resources that contributed to the research results reported within this paper. 

Citation: He R, Ren J, Malakhov MM, Pan W (2025) Enhancing nonlinear transcriptome- and proteome-wide association studies via trait imputation with applications to Alzheimer’s disease. PLoS Genet 21(4): e1011659. https://doi.org/10.1371/journal.pgen.1011659

Editor: Hae Kyung Im, University of Chicago, UNITED STATES OF AMERICA

Received: August 21, 2024; Accepted: March 18, 2025; Published: April 10, 2025

Copyright: © 2025 He et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: EADB GWAS summary statistics data are publicly available from the European Bioinformatics Institute (EBI) GWAS Catalog (https://www.ebi.ac.uk/gwas/) under accession no. GCST90027158. IGAP GWAS summary statistics data are publicly available from the EBI GWAS Catalog under accession no. GCST007511. Individual-level data from the UK Biobank (https://www.ukbiobank.ac.uk/), the Genotype-Tissue Expression (https://gtexportal.org/home/) Project, and the Alzheimer’s Disease Sequencing Project (https://adsp.niagads.org/) are available by application through their respective data access processes. The code used for the analyses presented in this paper is available at https://github.com/RuoyuHe/LS-imputation_TWAS. The code for DeLIVR is available at https://github.com/RuoyuHe/DeLIVR. The code for LS-imputation is available at https://github.com/ren328/LSimputing. The code for PRS-CS is available at https://github.com/getian107/PRScs. LDpred2 is supplied with the R package “bigsnpr” (https://github.com/privefl/bigsnpr).

Funding: This work was supported by the National Institutes of Health (NIH) under grants U01 AG073079 (RH, JR, WP), RF1 AG067924 (MM, WP) and R01 HL116720 (MM, WP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.