Synapse - TTT-Vnet: A 3D vision test-time training model for medical image analysis

TTT-Vnet: A 3D vision test-time training model for medical image analysis Conference Paper

Authors:	Pan, S.; Su, V.; Lo, S.; Hu, M.; Li, Y.; Chang, C. W.; Wang, T.; Qiu, R.; Yang, X.
Title:	TTT-Vnet: A 3D vision test-time training model for medical image analysis
Conference Title:	Medical Imaging 2025: Image Processing
Abstract:	We propose a novel V-shape three-dimensional vision test-time training (TTT) Vnet, to enhance deep-learning-based medical image analysis. Recent advancements have seen convolution neural networks (CNNs) and vision Transformers (ViT) excel in segmentation, synthesis, and registration tasks. However, CNNs and ViT cannot effectively and simultaneously capture local and global features. Moreover, they tend to overfit data due to limited medical image datasets. Inspired by TTT's generalization capabilities, our approach synergizes CNN and TTT to capture both local and global features. The TTT-Vnet, featuring an encoder-decoder architecture with shifted-window (Swin) mechanism and TTT modules in the deep layers, captures highly compressed, generalizable global features, enhancing learning for various medical tasks. Unlike conventional CNN/ViT models, TTT layers continue to learn and adapt to each new image sample during testing time, improving understanding of specific medical image samples and demonstrating better generalization. We implement the proposed TTT-Vnet model into different deep-learning frameworks to demonstrate the potential of using the proposed TTT-Vnet to achieve computed tomography (CT)-based organ segmentation, T1-weighted magnetic resonance imaging (MRI) synthesis, and 4DCT image registration. We also perform ablation study using CNN- and ViT-based models to benchmark the proposed TTT-Vnet. The results indicate that TTT-Vnet shows superior performance compared to conventional deep-learning networks. © 2025 SPIE.
Keywords:	neuroimaging; registration; image enhancement; segmentation; ct; synthesis; mri; computed tomography; image segmentation; transillumination; convolution neural network; optical flows; image correlation; medical image analysis; global feature; photointerpretation; vision test-time training model; test time; training model; vision tests
Journal Title	Progress in Biomedical Optics and Imaging - Proceedings of SPIE
Volume:	13406
Conference Dates:	2025 Feb 17-20
Conference Location:	San Diego, CA
ISBN:	1605-7422
Publisher:	SPIE
Date Published:	2025-01-01
Start Page:	134062Y
Language:	English
DOI:	10.1117/12.3047499
PROVIDER:	scopus
DOI/URL:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-105004577551&doi=10.1117%2f12.3047499&partnerID=40&md5=53244a234f5c732e1a38b8a89d1eb204
Notes:	Conference paper -- ISBN: 9781510685901 -- Source: Scopus

Altmetric

What is Altmetric?

Citation Impact

What is Dimensions Citation Badge?

BMJ Impact Analytics

MSK Authors

52 Wang

Related MSK Work

Self Supervised Learning Improves Robustness Of Deep Learning Lung Tumor Segmentation Models To Ct Imaging Differences

Medical Physics 2025
A Vgg Attention Vision Transformer Network For Benign And Malignant Classification Of Breast Ultrasound Images

Medical Physics 2022
Tumor Co Segmentation In Pet/Ct Using Multi Modality Fully Convolutional Neural Network

Physics in Medicine and Biology 2019
A Study On The Performance Of U Net Modifications In Retroperitoneal Tumor Segmentation

Progress in Biomedical Optics and Imaging - Proceedings of SPIE 2025
Multi Organ Ct Segmentation Using Shifted Window Multilayer Perceptron Mixer

Progress in Biomedical Optics and Imaging - Proceedings of SPIE 2023