Synapse - Learning regulatory programs that accurately predict differential expression with MEDUSA

Learning regulatory programs that accurately predict differential expression with MEDUSA Conference Paper

Authors:	Kundaje, A.; Lianoglou, S.; Li, X.; Quigley, D.; Arias, M.; Wiggins, C. H.; Zhang, L.; Leslie, C.
Title:	Learning regulatory programs that accurately predict differential expression with MEDUSA
Conference Title:	Workshop on Dialogue on Reverse Engineering Assessment and Methods (DREAM)
Abstract:	Inferring gene regulatory networks from high-throughput genomic data is one of the central problems in computational biology. In this paper, we describe a predictive modeling approach for studying regulatory networks, based on a machine learning algorithm called MEDUSA. MEDUSA integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, MEDUSA determines condition-specific regulators and discovers regulatory motifs that mediate the regulation of target genes. In this way, MEDUSA meaningfully models biological mechanisms of transcriptional regulation. MEDUSA solves the problem of predicting the differential (up/down) expression of target genes by using boosting, a technique from statistical learning, which helps to avoid overfitting as the algorithm searches through the high-dimensional space of potential regulators and sequence motifs. Experimental results demonstrate that MEDUSA achieves high prediction accuracy on held-out experiments (test data), that is, data not seen in training. We also present context-specific analysis of MEDUSA regulatory programs for DNA damage and hypoxia, demonstrating that MEDUSA identifies key regulators and motifs in these processes. A central challenge in the field is the difficulty of validating reverse-engineered networks in the absence of a gold standard. Our approach of learning regulatory programs provides at least a partial solution for the problem: MEDUSA's prediction accuracy on held-out data gives a concrete and statistically sound way to validate how well the algorithm performs. With MEDUSA, statistical validation becomes a prerequisite for hypothesis generation and network building rather than a secondary consideration. © 2007 New York Academy of Sciences.
Keywords:	signal transduction; gene cluster; promoter region; conference paper; genetic analysis; accuracy; proteome; dna damage; gene targeting; gene expression; biology; gene expression profiling; computational biology; models, biological; genetic transcription; algorithms; hypoxia; gene expression regulation; correlation analysis; statistical analysis; statistical significance; genetic engineering; algorithm; messenger rna; artificial intelligence; gene control; computer simulation; software; biomedical engineering; systems biology; learning; gene regulation; regulatory networks; machine learning; boosting; regulatory program; yeast stress response; motif element discrimination using sequence agglomeration
Journal Title	Annals of the New York Academy of Sciences
Volume:	1115
Conference Dates:	2006 Sept 7-8
Conference Location:	New York, NY
ISBN:	0077-8923
Publisher:	John Wiley & Sons
Date Published:	2007-12-01
Start Page:	178
End Page:	202
Language:	English
DOI:	10.1196/annals.1407.020
PUBMED:	17934055
PROVIDER:	scopus
DOI/URL:	http://www.scopus.com/inward/record.url?eid=2-s2.0-36249031526&partnerID=40&md5=fa987ce7812ff96953fbb861d9a6a2de
Notes:	Proceedings Paper -- Chapter in "Reverse Engineering Biological Networks: Opportunities and Challenges in Computational Methods for Pathway Inference" (ISBN: 978-1-573-31689-7) -- Workshop on Dialogue on Reverse Engineering Assessment and Methods - SEP 07-08, 2006 - Bronx, NY - "Cited By (since 1996): 2" - "Export Date: 17 November 2011" - "CODEN: ANYAA" - "Source: Scopus"

Altmetric

What is Altmetric?

Citation Impact

What is Dimensions Citation Badge?

BMJ Impact Analytics

MSK Authors

195 Leslie

Related MSK Work

A Predictive Model Of The Oxygen And Heme Regulatory Network In Yeast

PLoS Computational Biology 2008
Learning "Graph Mer" Motifs That Predict Gene Expression Trajectories In Development

PLoS Computational Biology 2010
Chromatin Interaction Aware Gene Regulatory Modeling With Graph Attention Networks

Genome Research 2022
Affinity Regression Predicts The Recognition Code Of Nucleic Acid Binding Proteins

Nature Biotechnology 2015
Deep Learning And Domain Specific Knowledge To Segment The Liver From Synthetic Dual Energy Ct Iodine Scans

Diagnostics 2022