Unifying cancer and normal RNA sequencing data from different sources Journal Article


Authors: Wang, Q.; Armenia, J.; Zhang, C.; Penson, A. V.; Reznik, Ed; Zhang, L.; Minet, T.; Ochoa, A.; Gross, B. E.; Iacobuzio-Donahue, C. A.; Betel, D.; Taylor, B. S.; Gao, J.; Schultz, N.
Article Title: Unifying cancer and normal RNA sequencing data from different sources
Abstract: Driven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data, such as the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA). While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources remains challenging, due to differences in sample and data processing. Here, we developed a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment, gene expression quantification, and batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from GTEx and TCGA and successfully corrected for study-specific biases, enabling comparative analysis between TCGA and GTEx. The normalized datasets are available for download on figshare. © 2018 The Author(s).
Journal Title: Scientific Data
Volume: 5
ISSN: 2052-4463
Publisher: Nature Publishing Group  
Date Published: 2018-04-17
Start Page: 180061
Language: English
DOI: 10.1038/sdata.2018.61
PROVIDER: scopus
PMCID: PMC5903355
PUBMED: 29664468
DOI/URL:
Notes: Article -- Export Date: 1 May 2018 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Jianjiong Gao
    132 Gao
  2. Barry Stephen Taylor
    238 Taylor
  3. Nikolaus D Schultz
    486 Schultz
  4. Benjamin E Gross
    44 Gross
  5. Eduard Reznik
    103 Reznik
  6. Alexander Vincent Penson
    54 Penson
  7. Qingguo   Wang
    6 Wang
  8. Joshua   Armenia
    56 Armenia
  9. Angelica Ochoa
    30 Ochoa
  10. Liguo Zhang
    9 Zhang