Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers Journal Article


Authors: Peng, X.; Dorman, K. S.
Article Title: Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
Abstract: Motivation Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during polymerase chain reaction (PCR) and sequencing. One solution attaches unique molecular identifiers (UMIs) to sample sequences before amplification. Counting UMIs instead of sequences provides unbiased estimates of abundance. While modern methods improve over naive counting by UMI identity, most do not account for UMI reuse or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences.Results We introduce Deduplication and Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological amplicon sequences and accurately estimate their deduplicated abundance. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods.
Keywords: hiv-1; quantification; sampling depth
Journal Title: Bioinformatics
Volume: 39
Issue: 1
ISSN: 1367-4803
Publisher: Oxford University Press  
Date Published: 2023-01-01
Start Page: btad002
Language: English
ACCESSION: WOS:000940926100080
DOI: 10.1093/bioinformatics/btad002
PROVIDER: wos
PMCID: PMC9891248
PUBMED: 36610988
Notes: Article -- Source: Wos
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Xiao Peng
    5 Peng