Binary state pattern clustering: A digital paradigm for class and biomarker discovery in gene microarray studies of cancer Journal Article


Authors: Beattie, B. J.; Robinson, P. N.
Article Title: Binary state pattern clustering: A digital paradigm for class and biomarker discovery in gene microarray studies of cancer
Abstract: Class and biomarker discovery continue to be among the preeminent goals in gene microarray studies of cancer. We have developed a new data mining technique, which we call Binary State Pattern Clustering (BSPC) that is specifically adapted for these purposes, with cancer and other categorical datasets. BSPC is capable of uncovering statistically significant sample subclasses and associated marker genes in a completely unsupervised manner. This is accomplished through the application of a digital paradigm, where the expression level of each potential marker gene is treated as being representative of its discrete functional state. Multiple-genes that divide samples into states along the same boundaries form a kind of gene-cluster that has an associated sample-cluster. BSPC is an extremely fast deterministic algorithm that scales well to large datasets. Here we describe results of its application to three publicly available oligonucleotide microarray datasets. Using an α-level of 0.05, clusters reproducing many of the known sample classifications were identified along with associated biomarkers. In addition, a number of simulations were conducted using shuffled versions of each of the original datasets, noise-added datasets, as well as completely artificial datasets. The robustness of BSPC was compared to that of three other publicly available clustering methods: ISIS, CTWC and SAMBA. The simulations demonstrate BSPC's substantially greater noise tolerance and confirm the accuracy of our calculations of statistical significance. © Mary Ann Liebert, Inc.
Keywords: gene cluster; sensitivity and specificity; biological marker; accuracy; animals; cluster analysis; gene expression profiling; tumor markers, biological; genetic association; algorithms; simulation; statistical significance; algorithm; oligonucleotide array sequence analysis; pattern recognition, automated; nucleotide sequence; dna microarray; marker gene; biomarker discovery; genes, neoplasm; clustering; noise; biclustering; class discovery; gene microarray; binary state pattern clustering
Journal Title: Journal of Computational Biology
Volume: 13
Issue: 5
ISSN: 1066-5277
Publisher: Mary Ann Liebert, Inc  
Date Published: 2006-06-01
Start Page: 1114
End Page: 1130
Language: English
DOI: 10.1089/cmb.2006.13.1114
PUBMED: 16796554
PROVIDER: scopus
DOI/URL:
Notes: --- - "Cited By (since 1996): 4" - "Export Date: 4 June 2012" - "CODEN: JCOBE" - "Source: Scopus"
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Bradley Beattie
    131 Beattie