SPICE, a dataset of drug-like molecules and peptides for training machine learning potentials Review


Authors: Eastman, P.; Behara, P. K.; Dotson, D. L.; Galvelis, R.; Herr, J. E.; Horton, J. T.; Mao, Y.; Chodera, J. D.; Pritchard, B. P.; Wang, Y.; De Fabritiis, G.; Markland, T. E.
Review Title: SPICE, a dataset of drug-like molecules and peptides for training machine learning potentials
Abstract: Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations. © 2022, The Author(s).
Keywords: proteins; protein; peptide; chemistry; peptides; computer simulation; conformation; molecular conformation; machine learning
Journal Title: Scientific Data
Volume: 10
ISSN: 2052-4463
Publisher: Nature Publishing Group  
Date Published: 2023-01-04
Start Page: 11
Language: English
DOI: 10.1038/s41597-022-01882-6
PUBMED: 36599873
PROVIDER: scopus
PMCID: PMC9813265
DOI/URL:
Notes: Data Paper -- Export Date: 1 February 2023 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. John Damon Chodera
    118 Chodera
  2. Yuanqing Wang
    5 Wang