Abstract: |
Automatic identification of metastatic sites in cancer patients from electronic health records is a challenging yet crucial task with significant implications for diagnosis and treatment. In this study, we propose a method to detect metastases from non-structured radiology report texts by accessing only their impression section. We build models based on pre-trained large language models and parameter-efficient fine-tuning. We compare model performances between utilizing non-structured reports and reports following institutional-level templates. By incorporating patient historical data and their timeline into the model, we bridge the gap between structured and non-structured reports. Our experiments are conducted on data gathered at Memorial Sloan Kettering Cancer Center (MSKCC) which have been annotated for metastases presence in three organs: liver, lung, and adrenal glands. Our results suggest that access to previous reports significantly improves model performance, with an average improvement of 7.7 points in terms of F1-score over all datasets. Additionally, incorporating temporal information enhances the accuracy of metastasis detection by 0.4 and 1.1 points on liver and adrenal glands data, respectively. Our method shows potential for automating radiology report labeling on a large scale in an efficient manner, with the potential to deploy on low-cost hardware. © 2024 IEEE. |