Large language model trained on clinical oncology data predicts cancer progression Journal Article


Authors: Zhu, M.; Lin, H.; Jiang, J.; Jinia, A. J.; Jee, J.; Pichotta, K.; Waters, M.; Rose, D.; Schultz, N.; Chalise, S.; Valleru, L.; Morin, O.; Moran, J.; Deasy, J. O.; Pilai, S.; Nichols, C.; Riely, G.; Braunstein, L. Z.; Li, A.
Article Title: Large language model trained on clinical oncology data predicts cancer progression
Abstract: Subspecialty knowledge barriers have limited the adoption of large language models (LLMs) in oncology. We introduce Woollie, an open-source, oncology-specific LLM trained on real-world data from Memorial Sloan Kettering Cancer Center (MSK) across lung, breast, prostate, pancreatic, and colorectal cancers, with external validation using University of California, San Francisco (UCSF) data. Woollie surpasses ChatGPT in medical benchmarks and excels in eight non-medical benchmarks. Analyzing 39,319 radiology impression notes from 4002 patients, it achieved an overall area under the receiver operating characteristic curve (AUROC) of 0.97 for cancer progression prediction on MSK data, including a notable 0.98 AUROC for pancreatic cancer. On UCSF data, it achieved an overall AUROC of 0.88, excelling in lung cancer detection with an AUROC of 0.95. As the first oncology specific LLM validated across institutions, Woollie demonstrates high accuracy and consistency across cancer types, underscoring its potential to enhance cancer progression analysis. © The Author(s) 2025.
Keywords: major clinical study; cancer growth; cancer patient; pancreas cancer; cancer diagnosis; diagnostic accuracy; colorectal cancer; breast cancer; lung cancer; oncology; prostate cancer; health care; kettering; biological organs; diseases; california; cancer progression; san francisco; clinical oncology; receiver operating characteristic curves; real-world; human; article; university of california; malignant neoplasm; open-source; language model; large language model; chatgpt; knowledge barriers
Journal Title: npj Digital Medicine
Volume: 8
ISSN: 2398-6352
Publisher: Nature Publishing Group  
Date Published: 2025-07-02
Start Page: 397
Language: English
DOI: 10.1038/s41746-025-01780-2
PROVIDER: scopus
PMCID: PMC12223279
PUBMED: 40604229
DOI/URL:
Notes: The MSK Cancer Center Support Grant (P30 CA008748) is acknowledged in the PubMed record and PDF. Corresponding MSK authors are Lior Z. Braunstein and Anyi Li -- Shirin Pillai's last name is misspelled in the publication -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Gregory J Riely
    604 Riely
  2. Joseph Owen Deasy
    527 Deasy
  3. Nikolaus D Schultz
    491 Schultz
  4. Jue Jiang
    79 Jiang
  5. Menglei Zhu
    37 Zhu
  6. Justin Jee
    57 Jee
  7. Anyi Li
    19 Li
  8. Jean Marie Moran
    52 Moran
  9. Chelsea Lynn Nichols
    16 Nichols
  10. Doori Rose
    8 Rose
  11. Shirin Ajay Pillai
    6 Pillai
  12. Michele Waters
    13 Waters
  13. Abbas Johar Jinia
    3 Jinia