Abstract: |
Self-supervised learning (SSL) is an approach to pretrain deep networks with unlabeled datasets by using pretext tasks that use images as 'ground truth'. Pretext tasks have been shown to impact accuracy of task categories, e.g. segmentation vs. classification. However, versatility of SSL features to downstream tasks involving different modalities has not been studied. We benchmarked impact of SSL tasks such as contrastive predictive coding, token self-distillation, and generative masked image modeling (MIM) with 3D vision transformer performed using 10K 3D-CTs (or 1.89M images) from various disease sites. SSL pretraining was used to assess (a) multi-organ segmentation under data-limited fine tuning, (b) feature reuse and (c) organ localization with multi-head attention. Analysis showed that pretext tasks combining MIM and token self-distillation balanced local and global attention distance, produced higher segmentation accuracy in few-shot and data-limited settings for MRI and CT. Feature reuse was impacted by similarity of pretraining and fine-tuning modality. © 2025 IEEE. |