Abstract: |
Purpose: Integrating auto-contouring in radiotherapy workflows is shifting the role of radiation oncologists from manual delineation to reviewing and correcting automatically generated contours. However, we postulate that this process is hindered by significant inter-evaluator variability in assessing the dosimetric impact of contour variations. This study investigates how radiation oncologists and medical physicists evaluate the impact of glioblastoma target volume (TV) variations on the dose to organs at risk (OARs), focusing on understanding inter-evaluator variability and decision-making patterns. Methods: A qualitative survey was conducted involving four radiation oncologists and three medical physicists. Participants classified 54 glioblastoma TV contour variations using up to four changes each across 14 patients as “better,” “no change,” or “worse” regarding their expected impact on the dose to OARs. The corresponding ground truth labels were derived from standardized treatment plans. Inter-evaluator variability was analyzed using Cohen's Kappa. Results: Substantial variability was observed, with Cohen's Kappa values ranging from weak to moderate agreement (0.33–0.74). Evaluators frequently overestimated the negative impact of contour variations, misclassifying 46% of “no change” variations as “Worse.” No evaluator judged contour variations as resulting in “better” doses to OARs, despite this being the case for 4 variations. Conclusion: Significant variability in estimating the dosimetric impact of contour variations underscores the critical need for standardized guidelines to reduce inconsistencies and allow for the assessment of automatically generated contours based on clinically meaningful factors. Evaluators frequently overestimated the negative impact of contour variations, potentially leading to inefficiencies and unnecessary contour corrections in clinical practice. © 2025 The Author(s) |