CIRDataset: A large-scale dataset for clinically-interpretable lung nodule radiomics and malignancy prediction Conference Paper


Authors: Choi, W.; Dahiya, N.; Nadeem, S.
Title: CIRDataset: A large-scale dataset for clinically-interpretable lung nodule radiomics and malignancy prediction
Conference Title: 25th International Conference of the Medical Image Computing and Computer Assisted Intervention (MICCAI 2022)
Abstract: Spiculations/lobulations, sharp/curved spikes on the surface of lung nodules, are good predictors of lung cancer malignancy and hence, are routinely assessed and reported by radiologists as part of the standardized Lung-RADS clinical scoring criteria. Given the 3D geometry of the nodule and 2D slice-by-slice assessment by radiologists, manual spiculation/lobulation annotation is a tedious task and thus no public datasets exist to date for probing the importance of these clinically-reported features in the SOTA malignancy prediction algorithms. As part of this paper, we release a large-scale Clinically-Interpretable Radiomics Dataset, CIRDataset, containing 956 radiologist QA/QC’ed spiculation/lobulation annotations on segmented lung nodules from two public datasets, LIDC-IDRI (N = 883) and LUNGx (N = 73). We also present an end-to-end deep learning model based on multi-class Voxel2Mesh extension to segment nodules (while preserving spikes), classify spikes (sharp/spiculation and curved/lobulation), and perform malignancy prediction. Previous methods have performed malignancy prediction for LIDC and LUNGx datasets but without robust attribution to any clinically reported/actionable features (due to known hyperparameter sensitivity issues with general attribution schemes). With the release of this comprehensively-annotated CIRDataset and end-to-end deep learning baseline, we hope that malignancy prediction methods can validate their explanations, benchmark against our baseline, and provide clinically-actionable insights. Dataset, code, pretrained models, and docker containers are available at https://github.com/nadeemlab/CIR. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords: lung cancer; medical imaging; forecasting; medical computing; biological organs; lung nodule; learning systems; deep learning; large dataset; public dataset; scoring criteria; end to end; spiculation; malignancy prediction; 3d geometry; large-scale datasets; lobulations
Journal Title Lecture Notes in Computer Science
Volume: 13435
Conference Dates: 2022 Sep 18-22
Conference Location: Singapore
ISBN: 0302-9743
Publisher: Springer  
Date Published: 2022-01-01
Start Page: 13
End Page: 22
Language: English
DOI: 10.1007/978-3-031-16443-9_2
PROVIDER: scopus
DOI/URL:
Notes: Conference Paper, located in MICCAI 2022 Proceedings, Part V (ISBN: 978-3-031-16442-2) -- Export Date: 1 November 2022 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Saad Nadeem
    50 Nadeem