ProteomeGenerator: A framework for comprehensive proteomics based on de novo transcriptome assembly and high-accuracy peptide mass spectral matching Journal Article


Authors: Cifani, P.; Dhabaria, A.; Chen, Z.; Yoshimi, A.; Kawaler, E.; Abdel-Wahab, O.; Poirier, J. T.; Kentsis, A.
Article Title: ProteomeGenerator: A framework for comprehensive proteomics based on de novo transcriptome assembly and high-accuracy peptide mass spectral matching
Abstract: Modern mass spectrometry now permits genome-scale and quantitative measurements of biological proteomes. However, analysis of specific specimens is currently hindered by the incomplete representation of biological variability of protein sequences in canonical reference proteomes and the technical demands for their construction. Here, we report ProteomeGenerator, a framework for de novo and reference-assisted proteogenomic database construction and analysis based on sample-specific transcriptome sequencing and high-accuracy mass spectrometry proteomics. This enables the assembly of proteomes encoded by actively transcribed genes, including sample-specific protein isoforms resulting from non-canonical mRNA transcription, splicing, or editing. To improve the accuracy of protein isoform identification in non-canonical proteomes, ProteomeGenerator relies on statistical target-decoy database matching calibrated using sample-specific controls. Its current implementation includes automatic integration with MaxQuant mass spectrometry proteomics algorithms. We applied this method for the proteogenomic analysis of splicing factor SRSF2 mutant leukemia cells, demonstrating high-confidence identification of non-canonical protein isoforms arising from alternative transcriptional start sites, intron retention, and cryptic exon splicing as well as improved accuracy of genome-scale proteome discovery. Additionally, we report proteogenomic performance metrics for current state-of-the-art implementations of SEQUEST HT, MaxQuant, Byonic, and PEAKS mass spectral analysis algorithms. Finally, ProteomeGenerator is implemented as a Snakemake workflow within a Singularity container for one-step installation in diverse computing environments, thereby enabling open, scalable, and facile discovery of sample-specific, non-canonical, and neomorphic biological proteomes. © 2018 American Chemical Society.
Keywords: transcriptomics; proteogenomics; de novo database construction; peptide fractionation; peptide-spectral matching; protein isoform analysis; scoring function
Journal Title: Journal of Proteome Research
Volume: 17
Issue: 11
ISSN: 1535-3893
Publisher: American Chemical Society  
Date Published: 2018-11-02
Start Page: 3681
End Page: 3692
Language: English
DOI: 10.1021/acs.jproteome.8b00295
PUBMED: 30295032
PROVIDER: scopus
PMCID: PMC6727203
DOI/URL:
Notes: Article -- Export Date: 3 December 2018 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. John Thomas Poirier
    82 Poirier
  2. Alex   Kentsis
    103 Kentsis
  3. Paolo   Cifani
    32 Cifani
  4. Akihide   Yoshimi
    35 Yoshimi
  5. Zining Chen
    1 Chen