Evaluation of annotation strategies using an entire genome sequence Journal Article


Authors: Iliopoulos, I.; Tsoka, S.; Andrade, M. A.; Enright, A. J.; Carroll, M.; Poullet, P.; Promponas, V.; Liakopoulos, T.; Palaios, G.; Pasquier, C.; Hamodrakas, S.; Tamames, J.; Yagnik, A. T.; Tramontano, A.; Devos, D.; Blaschke, C.; Valencia, A.; Brett, D.; Martin, D.; Leroy, C.; Rigoutsos, I.; Sander, C.; Ouzounis, C. A.
Article Title: Evaluation of annotation strategies using an entire genome sequence
Abstract: Motivation: Genome-wide functional annotation either by manual or automatic means has raised considerable concerns regarding the accuracy of assignments and the reproducibility of methodologies. In addition, a performance evaluation of automated systems that attempt to tackle sequence analyses rapidly and reproducibly is generally missing. In order to quantify the accuracy and reproducibility of function assignments on a genome-wide scale, we have re-annotated the entire genome sequence of Chlamydia trachomatis (serovar D), in a collaborative manner. Results: We have encoded all annotations in a structured format to allow further comparison and data exchange and have used a scale that records the different levels of potential annotation errors according to their propensity to propagate in the database due to transitive function assignments. We conclude that genome annotation may entail a considerable amount of errors, ranging from simple typographical errors to complex sequence analysis problems. The most surprising result of this comparative study is that automatic systems might perform as well as the teams of experts annotating genome sequences.
Keywords: controlled study; gene sequence; sequence analysis; nonhuman; methodology; sensitivity and specificity; genetic analysis; reproducibility; accuracy; reproducibility of results; animal cell; gene expression profiling; gene function; automation; information processing; databases, protein; animalia; bacterial proteins; amino acid sequence; molecular sequence data; data analysis; intermethod comparison; rating scale; genome; expert system; genetic code; analytical error; databases, genetic; documentation; information storage and retrieval; genome, bacterial; database management systems; chlamydia trachomatis; serotype; priority journal; article; sequence database
Journal Title: Bioinformatics
Volume: 19
Issue: 6
ISSN: 1367-4803
Publisher: Oxford University Press  
Date Published: 2003-04-12
Start Page: 717
End Page: 726
Language: English
DOI: 10.1093/bioinformatics/btg077
PUBMED: 12691983
PROVIDER: scopus
DOI/URL:
Notes: Export Date: 12 September 2014 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Chris Sander
    210 Sander