Patterns of metastatic disease in patients with cancer derived from natural language processing of structured CT radiology reports over a 10-year period Journal Article


Authors: Do, R. K. G.; Lupton, K.; Causa Andrieu, P. I.; Luthra, A.; Taya, M.; Batch, K.; Nguyen, H.; Rahurkar, P.; Gazit, L.; Nicholas, K.; Fong, C. J.; Gangai, N.; Schultz, N.; Zulkernine, F.; Sevilimedu, V.; Juluru, K.; Simpson, A.; Hricak, H.
Article Title: Patterns of metastatic disease in patients with cancer derived from natural language processing of structured CT radiology reports over a 10-year period
Abstract: Background: Patterns of metastasis in cancer are increasingly relevant to prognostication and treatment planning but have historically been documented by means of autopsy series. Purpose: To show the feasibility of using natural language processing (NLP) to gather accurate data from radiology reports for assessing spatial and temporal patterns of metastatic spread in a large patient cohort. Materials and Methods: In this retrospective longitudinal study, consecutive patients who underwent CT from July 2009 to April 2019 and whose CT reports followed a departmental structured template were included. Three radiologists manually curated a sample of 2219 reports for the presence or absence of metastases across 13 organs; these manually curated reports were used to develop three NLP models with an 80%-20% split for training and test sets. A separate random sample of 448 manually curated reports was used for validation. Model performance was measured by accuracy, precision, and recall for each organ. The best-performing NLP model was used to generate a final database of metastatic disease across all patients. For each cancer type, statistical descriptive reports were provided by analyzing the frequencies of metastatic disease at the report and patient levels. Results: In 91 665 patients (mean age ± standard deviation, 61 years ± 15; 46 939 women), 387 359 reports were labeled. The best-performing NLP model achieved accuracies from 90% to 99% across all organs. Metastases were most frequently reported in abdominopelvic (23.6% of all reports) and thoracic (17.6%) nodes, followed by lungs (14.7%), liver (13.7%), and bones (9.9%). Metastatic disease tropism is distinct among common cancers, with the most common first site being bones in prostate and breast cancers and liver among pancreatic and colorectal cancers. Conclusion: Natural language processing may be applied to cancer patients' CT reports to generate a large database of metastatic phenotypes. Such a database could be combined with genomic studies and used to explore prognostic imaging phenotypes with relevance to treatment planning. © RSNA, 2021
Keywords: adult; middle aged; major clinical study; bone metastasis; pancreas cancer; colorectal cancer; metastasis; computer assisted tomography; breast cancer; cohort analysis; retrospective study; prostate cancer; liver metastasis; lung metastasis; feasibility study; longitudinal study; tropism; natural language processing; cancer prognosis; measurement accuracy; human; male; female; article
Journal Title: Radiology
Volume: 301
Issue: 1
ISSN: 0033-8419
Publisher: Radiological Society of North America, Inc.  
Date Published: 2021-10-01
Start Page: 115
End Page: 122
Language: English
DOI: 10.1148/radiol.2021210043
PUBMED: 34342503
PROVIDER: scopus
PMCID: PMC8474969
DOI/URL:
Notes: Article -- Export Date: 2 November 2021 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Kinh Gian Do
    256 Do
  2. Hedvig Hricak
    419 Hricak
  3. Nikolaus D Schultz
    486 Schultz
  4. Krishna   Juluru
    35 Juluru
  5. Natalie Gangai
    61 Gangai
  6. Lior Gazit
    19 Gazit
  7. Christopher Joseph Fong
    42 Fong
  8. Anisha Luthra
    26 Luthra
  9. Michio David Taya
    2 Taya
  10. Huy Anh Nguyen
    3 Nguyen