TTT-Vnet: A 3D vision test-time training model for medical image analysis Conference Paper


Authors: Pan, S.; Su, V.; Lo, S.; Hu, M.; Li, Y.; Chang, C. W.; Wang, T.; Qiu, R.; Yang, X.
Title: TTT-Vnet: A 3D vision test-time training model for medical image analysis
Conference Title: Medical Imaging 2025: Image Processing
Abstract: We propose a novel V-shape three-dimensional vision test-time training (TTT) Vnet, to enhance deep-learning-based medical image analysis. Recent advancements have seen convolution neural networks (CNNs) and vision Transformers (ViT) excel in segmentation, synthesis, and registration tasks. However, CNNs and ViT cannot effectively and simultaneously capture local and global features. Moreover, they tend to overfit data due to limited medical image datasets. Inspired by TTT's generalization capabilities, our approach synergizes CNN and TTT to capture both local and global features. The TTT-Vnet, featuring an encoder-decoder architecture with shifted-window (Swin) mechanism and TTT modules in the deep layers, captures highly compressed, generalizable global features, enhancing learning for various medical tasks. Unlike conventional CNN/ViT models, TTT layers continue to learn and adapt to each new image sample during testing time, improving understanding of specific medical image samples and demonstrating better generalization. We implement the proposed TTT-Vnet model into different deep-learning frameworks to demonstrate the potential of using the proposed TTT-Vnet to achieve computed tomography (CT)-based organ segmentation, T1-weighted magnetic resonance imaging (MRI) synthesis, and 4DCT image registration. We also perform ablation study using CNN- and ViT-based models to benchmark the proposed TTT-Vnet. The results indicate that TTT-Vnet shows superior performance compared to conventional deep-learning networks. © 2025 SPIE.
Keywords: neuroimaging; registration; image enhancement; segmentation; ct; synthesis; mri; computed tomography; image segmentation; transillumination; convolution neural network; optical flows; image correlation; medical image analysis; global feature; photointerpretation; vision test-time training model; test time; training model; vision tests
Journal Title Progress in Biomedical Optics and Imaging - Proceedings of SPIE
Volume: 13406
Conference Dates: 2025 Feb 17-20
Conference Location: San Diego, CA
ISBN: 1605-7422
Publisher: SPIE  
Date Published: 2025-01-01
Start Page: 134062Y
Language: English
DOI: 10.1117/12.3047499
PROVIDER: scopus
DOI/URL:
Notes: Conference paper -- ISBN: 9781510685901 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Tonghe Wang
    51 Wang