Authors: | Di Gioacchino, A.; Lecce, I.; Greenbaum, B. D.; Monasson, R.; Cocco, S. |
Article Title: | Deciphering the Code of Viral-Host Adaptation Through Maximum-Entropy Nucleotide Bias Models |
Abstract: | How viruses evolve largely depends on their hosts. To quantitatively characterize this dependence, we introduce Maximum Entropy Nucleotide Bias models (MENB) learned from single, di- and tri-nucleotide usage of viral sequences that infect a given host. We first use MENB to classify the viral family and the host of a virus from its genome, among four families of ssRNA viruses and three hosts. We show that both the viral family and the host leave a fingerprint in nucleotide motif usages that MENB models decode. Benchmarking our approach against state-of-the-art methods based on deep neural networks shows that MENB is rapid, interpretable and robust. Our approach is able to predict, with good accuracy, both the viral family and the host from a whole genomic sequence or a portion of it. MENB models also display promising out of sample generalization ability on viral sequences of new host taxa or new viral families. Our approach is also capable of identifying, within the limitations imposed by the three-host setting, intermediate hosts for well-known pathogenic strains of Influenza A subtypes and Human Coronavirus and recombinations and reassortments on specific genomic regions. Finally, MENB models can be used to track the adaptation to the new host, to shed light on the more relevant selective pressures that acted on motif usage during this process and to design new sequences with altered nucleotide usage at fixed amino-acid content. © 2025 The Author(s). Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. |
Keywords: | controlled study; genetics; nonhuman; accuracy; biological model; classification; evolution, molecular; genetic recombination; molecular evolution; probability; cpg island; benchmarking; models, genetic; amino acid; 3' untranslated region; virus; virus genome; virus strain; zoonosis; swine; genetic heterogeneity; orthomyxoviridae; nucleotide; open reading frame; pandemic; virus cell interaction; viruses; genome, viral; partition coefficient; entropy; influenza a virus; human coronavirus nl63; humans; human; article; nucleotide motif; middle east respiratory syndrome coronavirus; picornaviridae; genetic reassortment; deep neural network; severe acute respiratory syndrome coronavirus 2; coronaviridae; human coronavirus oc43; avian; flaviviridae; maximum entropy models; nucleotide usage; viral host adaptations; host adaptation; influenza a virus (h1n1); influenza a virus (h2n2); influenza a virus (h3n2); influenza a virus (h5n1); intermediate host; maximum entropy model; single-stranded rna virus; virus deep learning host prediction |
Journal Title: | Molecular Biology and Evolution |
Volume: | 42 |
Issue: | 6 |
ISSN: | 07374038 |
Publisher: | Oxford University Press On Behalf of Society for Molecular Biology and Evolution |
Date Published: | 2025-01-01 |
Start Page: | msaf127 |
Language: | English |
DOI: | 10.1093/molbev/msaf127 |
PUBMED: | 40458044 |
PROVIDER: | scopus |
DOI/URL: | |
Notes: | Article -- Source: Scopus |