Sparse integrative clustering of multiple omics data sets Journal Article


Authors: Shen, R.; Wang, S.; Mo, Q.
Article Title: Sparse integrative clustering of multiple omics data sets
Abstract: High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91-108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek "experimental" points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets. © Institute of Mathematical Statistics.
Keywords: penalized regression; latent variable approach; sparse integrative clustering
Journal Title: Annals of Applied Statistics
Volume: 7
Issue: 1
ISSN: 1932-6157
Publisher: Institute of Mathematical Statistics  
Date Published: 2013-01-01
Start Page: 269
End Page: 294
Language: English
PROVIDER: scopus
DOI: 10.1214/12-AOAS578
PMCID: PMC3935438
PUBMED: 24587839
DOI/URL:
Notes: --- - "Export Date: 1 May 2013" - ":doi 10.1214/12-AOAS578" - "Source: Scopus"
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Ronglai Shen
    204 Shen