Stabilized marker gene identification and functional annotation from single-cell transcriptomic data
Sandesh Acharya, Pathum Kossinna, Qingrun Zhang, Jiami Guo
Abstract
With the rapid emergence of single-cell transcriptomics datasets, reproducible marker genes and functional annotation of cell type or state is becoming increasingly important. Conventional methods that rely on differential gene expression (DEG) analysis lack both consistency across datasets and functional annotations of selected markers. Here, we present scSCOPE, an R-based platform that utilizes stabilized LASSO (Least Absolute Shrinkage and Selection Operator) feature selection, bootstrapped co-expression networks, and pathway enrichments to identify reproducible and functionally relevant marker genes and associated pathways in scRNAseq datasets.
Introduction
Single cell RNA sequencing (scRNAseq) enables high throughput profiling of transcriptomics for millions of cells at a time, and has transformed our understanding of cell heterogeneity, physiological state, and function in varied tissues across development and diseases [1–7]. Cell type identification in scRNAseq requires clustering of cells into distinct cell types based on their gene expression profiles, followed by the identification of marker genes associated with each cell type [8].
Materials and method
scSCOPE framework
The SCOPE framework was designed to identify candidate genes and pathways separating normal and diseased tissues in bulk RNA-seq datasets. We have further improved the SCOPE framework to be applied directly to single cell datasets to identify cell-type specific marker genes and pathways.
Results
The framework of scSCOPE
The input for scSCOPE is a clustered single-cell RNAseq dataset with an expression matrix (Fig 1A). Based on scRNAseq data clustering (Fig 1A), scSCOPE begins by running a bootstrapped logistic LASSO (see Methods) to identify “core genes” that robustly separate two groups of cells in multiple iterations [30] (Fig 1B). These “core genes” then undergo bootstrapped co-expression network analysis (Methods) to identify their stably co-expressed genes (“secondary genes”) (Fig 1C) [28].
Discussion
Here we present scSCOPE, an optimised toolbox for single-cell RNA-seq based cell-type identification and functional annotation. To the best of our knowledge, scSCOPE is currently the only computational tool that implements gene co-expression, pathway enrichment and differential expression to identify marker genes in single cell transcriptomics data.
Acknowledgments
We thank the support from the HBI International Graduate Recruitment in Neuroscience Scholarship (S.A.) and ACHRI Graduate Scholarship (S.A.). We would also like to thank Mehr Malhotra for proof-reading our article.
Citation: Acharya S, Kossinna P, Zhang Q, Guo J (2025) Stabilized marker gene identification and functional annotation from single-cell transcriptomic data. PLoS Comput Biol 21(10): e1013574. https://doi.org/10.1371/journal.pcbi.1013574
Editor: Xiuwei Zhang, Georgia Institute of Technology College of Computing, UNITED STATES OF AMERICA
Received: December 5, 2024; Accepted: September 30, 2025; Published: October 17, 2025
Copyright: © 2025 Acharya et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data Availability Publicly available scRNA-seq datasets for methods benchmarking: Mouse Immune (GSE109999, GSE168158), Human PBMC (GSE132044), Tabula Muris (https://figshare.com/projects/Tabula_Muris_Transcriptomic_characterization_of_20_organs_and_tissues_from_Mus_musculus_at_single_cell_resolution/27733). Code Availability scSCOPE is freely available to all users from our GitHub website (https://github.com/QingrunZhangLab/scSCOPE). Pathway and gene network plots can be generated by users following the detailed instructions available on our GitHub repository. Additionally, users can explore these plots interactively through web interfaces accessible at Gene Network (https://sant7.shinyapps.io/geneNetwork/) and Pathway Network (https://sant7.shinyapps.io/pathwayNetwork/).
Funding: This research was supported by The New York Stem Cell Foundation (NYSCF-R-N163 to J.G.) and NSERC Discovery Grant (RGPIN-2025-05344 to Q.Z.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.