Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering Journal Article


Authors: Arora, A.; Olshen, A. B.; Seshan, V. E.; Shen, R.
Article Title: Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering
Abstract: Background: Comprehensive molecular profiling has revealed somatic variations in cancer at genomic, epigenomic, transcriptomic, and proteomic levels. The accumulating data has shown clearly that molecular phenotypes of cancer are complex and influenced by a multitude of factors. Conventional unsupervised clustering applied to a large patient population is inevitably driven by the dominant variation from major factors such as cell-of-origin or histology. Translation of these data into clinical relevance requires more effective extraction of information directly associated with patient outcome. Methods: Drawing from ideas in supervised text classification, we developed survClust, an outcome-weighted clustering algorithm for integrative molecular stratification focusing on patient survival. survClust was performed on 18 cancer types across multiple data modalities including somatic mutation, DNA copy number, DNA methylation, and mRNA, miRNA, and protein expression from the Cancer Genome Atlas study to identify novel prognostic subtypes. Results: Our analysis identified the prognostic role of high tumor mutation burden with concurrently high CD8 T cell immune marker expression and the aggressive clinical behavior associated with CDKN2A deletion across cancer types. Visualization of somatic alterations, at a genome-wide scale (total mutation burden, mutational signature, fraction genome altered) and at the individual gene level, using circomap further revealed indolent versus aggressive subgroups in a pan-cancer setting. Conclusions: Our analysis has revealed prognostic molecular subtypes not previously identified by unsupervised clustering. The algorithm and tools we developed have direct utility toward patient stratification based on tumor genomics to inform clinical decision-making. The survClust software tool is available at https://github.com/arorarshi/survClust. © 2020, The Author(s).
Keywords: adult; cancer survival; middle aged; major clinical study; somatic mutation; gene deletion; clinical feature; histopathology; glioma; cd8+ t lymphocyte; microrna; tumor volume; cohort analysis; mutational analysis; dna methylation; renal cell carcinoma; simulation; survival time; messenger rna; clinical decision making; computer simulation; cyclin dependent kinase inhibitor 2a; gene dosage; point mutation; statistical model; data analysis software; tumor microenvironment; isocitrate dehydrogenase 1; clinical outcome; idh1 gene; idh2 gene; cdkn2a gene; patient survival; cancer prognosis; isocitrate dehydrogenase 2; human; male; female; priority journal; article; supervised learning; oncogenomics; malignant neoplasm; supervised machine learning; integrative clustering; mrna expression level; protein expression level; cross validation; prognostic molecular stratification; clustering algorithm; survclust algorithm
Journal Title: Genome Medicine
Volume: 12
ISSN: 1756-994X
Publisher: Biomed Central Ltd  
Date Published: 2020-12-03
Start Page: 110
Language: English
DOI: 10.1186/s13073-020-00804-8
PUBMED: 33272320
PROVIDER: scopus
PMCID: PMC7716509
DOI/URL:
Notes: Article -- Export Date: 4 January 2021 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Venkatraman Ennapadam Seshan
    382 Seshan
  2. Ronglai Shen
    204 Shen
  3. Arshi Arora
    36 Arora