Generalizability of lesion detection and segmentation when ScaleNAS is trained on a large multi-organ dataset and validated in the liver Journal Article


Authors: Ma, J.; Yang, H.; Chou, Y.; Yoon, J.; Allison, T.; Komandur, R.; McDunn, J.; Tasneem, A.; Do, R. K.; Schwartz, L. H.; Zhao, B.
Article Title: Generalizability of lesion detection and segmentation when ScaleNAS is trained on a large multi-organ dataset and validated in the liver
Abstract: Background: Tumor assessment through imaging is crucial for diagnosing and treating cancer. Lesions in the liver, a common site for metastatic disease, are particularly challenging to accurately detect and segment. This labor-intensive task is subject to individual variation, which drives interest in automation using artificial intelligence (AI). Purpose: Evaluate AI for lesion detection and lesion segmentation using CT in the context of human performance on the same task. Use internal testing to determine how an AI-developed model (ScaleNAS) trained on lesions in multiple organs performs when tested specifically on liver lesions in a dataset integrating real-world and clinical trial data. Use external testing to evaluate whether ScaleNAS's performance generalizes to publicly available colorectal liver metastases (CRLM) from The Cancer Imaging Archive (TCIA). Methods: The CUPA study dataset included patients whose CT scan of chest, abdomen, or pelvis at Columbia University between 2010–2020 indicated solid tumors (CUIMC, n = 5011) and from two clinical trials in metastatic colorectal cancer, PRIME (n = 1183) and Amgen (n = 463). Inclusion required ≥1 measurable lesion; exclusion criteria eliminated 1566 patients. Data were divided at the patient level into training (n = 3996), validation (n = 570), and testing (n = 1529) sets. To create the reference standard for training and validation, each case was annotated by one of six radiologists, randomly assigned, who marked the CUPA lesions without access to any previous annotations. For internal testing we refined the CUPA test set to contain only patients who had liver lesions (n = 525) and formed an enhanced reference standard through expert consensus reviewing prior annotations. For external testing, TCIA-CRLM (n = 197) formed the test set. The reference standard for TCIA-CRLM was formed by consensus review of the original annotation and contours by two new radiologists. Metrics for lesion detection were sensitivity and false positives. Lesion segmentation was assessed with median Dice coefficient, under-segmentation ratio (USR), and over-segmentation ratio (OSR). Subgroup analysis examined the influence of lesion size ≥ 10 mm (measurable by RECIST1.1) versus all lesions (important for early identification of disease progression). Results: ScaleNAS trained on all lesions achieved sensitivity of 71.4% and Dice of 70.2% for liver lesions in the CUPA internal test set (3,495 lesions) and sensitivity of 68.2% and Dice 64.2% in the TCIA-CRLM external test set (638 lesions). Human radiologists had mean sensitivity of 53.5% and Dice of 73.9% in CUPA and sensitivity of 84.1% and Dice of 88.4% in TCIA-CRLM. Performance improved for ScaleNAS and radiologists in the subgroup of lesions that excluded sub-centimeter lesions. Conclusions: Our study presents the first evaluation of ScaleNAS in medical imaging, demonstrating its liver lesion detection and segmentation performance across diverse datasets. Using consensus reference standards from multiple radiologists, we addressed inter-observer variability and contributed to consistency in lesion annotation. While ScaleNAS does not surpass radiologists in performance, it offers fast and reliable results with potential utility in providing initial contours for radiologists. Future work will extend this model to lung and lymph node lesions, ultimately aiming to enhance clinical applications by generalizing detection and segmentation across tissue types. © 2025 American Association of Physicists in Medicine.
Keywords: controlled study; major clinical study; clinical trial; solid tumor; validation process; liver neoplasms; sensitivity and specificity; consensus; computer assisted tomography; tomography, x-ray computed; pathology; diagnostic imaging; oncology; automation; information processing; colorectal neoplasms; liver metastasis; liver; radiologist; radiology; computerized tomography; artificial intelligence; colorectal tumor; liver tumor; diagnosis; benchmarking; reliability; image processing, computer-assisted; image processing; false positive result; arthroplasty; diseases; performance; cancer imaging; lesion segmentation; image segmentation; metastatic colorectal cancer; spiral computer assisted tomography; diagnostic test accuracy study; procedures; response evaluation criteria in solid tumors; reference standard; humans; human; article; liver lesions; x-ray computed tomography; deep learning; lesion detection; lesion volume; datasets as topic; lesion segmentations; x ray absorption; test sets
Journal Title: Medical Physics
Volume: 52
Issue: 2
ISSN: 0094-2405
Publisher: American Association of Physicists in Medicine  
Date Published: 2025-02-01
Start Page: 1005
End Page: 1018
Language: English
DOI: 10.1002/mp.17504
PUBMED: 39576046
PROVIDER: scopus
DOI/URL:
Notes: Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Lawrence H Schwartz
    307 Schwartz
  2. Binsheng Zhao
    55 Zhao
  3. Kinh Gian Do
    257 Do
  4. Hao Yang
    5 Yang
  5. Jingchen Ma
    3 Ma