Analyzing protein function on a genomic scale: The importance of gold-standard positives and negatives for network prediction Journal Article


Authors: Jansen, R.; Gerstein, M.
Article Title: Analyzing protein function on a genomic scale: The importance of gold-standard positives and negatives for network prediction
Abstract: The concept of 'protein function' is rather 'fuzzy' because it is often based on whimsical terms or contradictory nomenclature. This currently presents a challenge for functional genomics because precise definitions are essential for most computational approaches. Addressing this challenge, the notion of networks between biological entities (including molecular and genetic interaction networks as well as transcriptional regulatory relationships) potentially provides a unifying language suitable for the systematic description of protein function. Predicting the edges in protein networks requires reference sets of examples with known outcome (that is, 'gold standards'). Such reference sets should ideally include positive examples - as is now widely appreciated - but also, equally importantly, negative ones. Moreover, it is necessary to consider the expected relative occurrence of positives and negatives because this affects the misclassification rates of experiments and computational predictions. For instance, a reason why genome-wide, experimental protein-protein interaction networks have high inaccuracies is that the prior probability of finding interactions (positives) rather than non-interacting protein pairs (negatives) in unbiased screens is very small. These problems can be addressed by constructing well-defined sets of non-interacting proteins from subcellular localization data, which allows computing the probability of interactions based on evidence from multiple datasets.
Keywords: protein expression; gene sequence; review; nonhuman; laboratory diagnosis; sensitivity and specificity; genetic analysis; protein function; protein localization; proteins; protein analysis; proteome; genes; gene expression profiling; computational biology; protein protein interaction; gene function; algorithms; proteomics; prediction; gene expression regulation; genome analysis; dna; algorithm; messenger rna; saccharomyces cerevisiae; cellular distribution; genomics; dna sequence; molecular interaction; protein structure; x ray crystallography; protein interaction mapping; rna splicing; receiver operating characteristic; protein dna interaction; saccharomyces; microbial genetics
Journal Title: Current Opinion in Microbiology
Volume: 7
Issue: 5
ISSN: 1369-5274
Publisher: Elsevier Inc.  
Date Published: 2004-10-01
Start Page: 535
End Page: 545
Language: English
DOI: 10.1016/j.mib.2004.08.012
PROVIDER: scopus
PUBMED: 15451510
DOI/URL:
Notes: Curr. Opin. Microbiol. -- Cited By (since 1996):101 -- Export Date: 16 June 2014 -- CODEN: COMIF -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Ronald Jansen
    4 Jansen