Synapse - Information assessment on predicting protein-protein interactions

Information assessment on predicting protein-protein interactions Journal Article

Authors:	Lin, N.; Wu, B.; Jansen, R.; Gerstein, M.; Zhao, H.
Article Title:	Information assessment on predicting protein-protein interactions
Abstract:	Background: Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information. Results: Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions. Conclusions: In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO) functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the Bayesian methods decreased classification performance. © 2004 Lin et al; licensee BioMed Central Ltd.
Keywords:	protein expression; protein function; proteins; accuracy; genes; bayes theorem; classification; protein protein interaction; prediction; correlation analysis; forecasting; genomics; intermethod comparison; yeast; randomization; logistic regression analysis; decision trees; machinery; ontogeny; fungal genetics; protein-protein interactions; functional annotation; logistic regressions; article; classification performance; complete information; false positive and false negatives; functional similarity; missing information; bayesian networks
Journal Title:	BMC Bioinformatics
Volume:	5
ISSN:	1471-2105
Publisher:	Biomed Central Ltd
Date Published:	2004-10-18
Start Page:	154
Language:	English
DOI:	10.1186/1471-2105-5-154
PROVIDER:	scopus
PMCID:	PMC529436
PUBMED:	15491499
DOI/URL:	http://www.scopus.com/inward/record.url?eid=2-s2.0-13244265581&partnerID=40&md5=9cfdc0c079740daeee8be1e7145a3c8d
Notes:	BMC Bioinform. -- Cited By (since 1996):90 -- Export Date: 16 June 2014 -- CODEN: BBMIC -- Source: Scopus

Altmetric

What is Altmetric?

Citation Impact

What is Dimensions Citation Badge?

BMJ Impact Analytics

MSK Authors

4 Jansen

Related MSK Work

Uncovering The Molecular Machinery Of The Human Spindle An Integration Of Wet And Dry Systems Biology

PLoS ONE 2012
Analyzing Protein Function On A Genomic Scale: The Importance Of Gold Standard Positives And Negatives For Network Prediction

Current Opinion in Microbiology 2004
Sequence Co Evolution Gives 3 D Contacts And Structures Of Protein Complexes

eLife 2014
Patik Amad: Putting Microarray Data Into Pathway Context

Proteomics 2008
Detection Of Functional Modules From Protein Interaction Networks

Proteins: Structure, Function and Bioinformatics 2004