Generating simulated SNP array and sequencing data to assess genomic segmentation algorithms Journal Article


Authors: Zucker, M. R.; Coombes, K. R.
Article Title: Generating simulated SNP array and sequencing data to assess genomic segmentation algorithms
Abstract: We developed a tool, implemented in an R package called true and accurate clone generator (TACG), to simulate ‘ground truth’ and realistic SNP array and single nucleotide variant (SNV) data. We present TACG and use it to assess several different approaches to segmentation of copy number data from SNP arrays, with a particular interest in detecting copy number variations (CNVs) in cancer samples. We demonstrate that DNAcopy, an algorithm using circular binary segmentation, generally performs best, which is in agreement with previous research. We determine the conditions under which it and other methods break down. In particular, we assess how characteristics like clonal heterogeneity, presence of nested CNVs, and the type of aberration affect algorithm accuracy. The simulations we generated proved to be useful in determining not just the comparative overall accuracy of different algorithms, but also in determining how their efficacy is affected by the biological characteristics of samples from which the data was generated. Copyright © 2020 Inderscience Enterprises Ltd.
Keywords: algorithms; genomics; segmentation; snp arrays; circular binary segmentation; simulations; copy number variation; hidden markov models; cancer; whole exome sequencing
Journal Title: International Journal of Computational Biology and Drug Design
Volume: 13
Issue: 5-6
ISSN: 1756-0756
Publisher: Inderscience Publishers  
Date Published: 2020-01-01
Start Page: 438
End Page: 453
Language: English
DOI: 10.1504/ijcbdd.2020.113822
PROVIDER: scopus
DOI/URL:
Notes: Conference Paper -- Export Date: 3 May 2021 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Mark Raymond Zucker
    12 Zucker