Supervised learning of high-confidence phenotypic subpopulations from single-cell data Journal Article


Authors: Ren, T.; Chen, C.; Danilov, A. V.; Liu, S.; Guan, X.; Du, S.; Wu, X.; Sherman, M. H.; Spellman, P. T.; Coussens, L. M.; Adey, A. C.; Mills, G. B.; Wu, L. Y.; Xia, Z.
Article Title: Supervised learning of high-confidence phenotypic subpopulations from single-cell data
Abstract: Accurately identifying phenotype-relevant cell subsets from heterogeneous cell populations is crucial for delineating the underlying mechanisms driving biological or clinical phenotypes. Here by deploying a Learning with Rejection strategy, we developed a novel supervised learning framework called PENCIL to identify subpopulations associated with categorical or continuous phenotypes from single-cell data. By embedding a feature selection function into this flexible framework, for the first time, we were able to simultaneously select informative features and identify cell subpopulations, enabling accurate identification of phenotypic subpopulations otherwise missed by methods incapable of concurrent gene selection. Furthermore, the regression mode of PENCIL presents a novel ability for supervised phenotypic trajectory learning of subpopulations from single-cell data. We conducted comprehensive simulations to evaluate PENCIL’s versatility in simultaneous gene selection, subpopulation identification and phenotypic trajectory prediction. PENCIL is fast and scalable to analyse one million cells within 1 h. Using the classification mode, PENCIL detected T-cell subpopulations associated with melanoma immunotherapy outcomes. Moreover, when applied to single-cell RNA sequencing of a patient with mantle cell lymphoma with drug treatment across multiple timepoints, the regression mode of PENCIL revealed a transcriptional treatment response trajectory. Collectively, our work introduces a scalable and flexible infrastructure to accurately identify phenotype-associated subpopulations from single-cell data. © 2023, The Author(s), under exclusive licence to Springer Nature Limited.
Keywords: cell proliferation; genes; oncology; cell culture; drug therapy; t-cells; trajectories; gene selection; single cells; learning frameworks; cell populations; feature selection; supervised learning; embeddings; cell data; cell subpopulations; features selection; high confidence; regression mode
Journal Title: Nature Machine Intelligence
Volume: 5
Issue: 5
ISSN: 2522-5839
Publisher: Nature Publishing Group  
Date Published: 2023-05-01
Start Page: 528
End Page: 541
Language: English
DOI: 10.1038/s42256-023-00656-y
PROVIDER: scopus
DOI/URL:
Notes: Article -- Erratum issued, see DOI: 10.1038/s42256-023-00681-x -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors