Papers
arxiv:2601.10880

Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

Published on Jan 15
ยท Submitted by
ChongCongJiang
on Jan 20
Authors:
,
,
,
,
,
,
,
,

Abstract

Medical SAM3 adapts the SAM3 foundation model through comprehensive fine-tuning on diverse medical imaging datasets to achieve robust prompt-driven segmentation across various modalities and anatomical structures.

AI-generated summary

Promptable segmentation foundation models such as SAM3 have demonstrated strong generalization capabilities through interactive and concept-based prompting. However, their direct applicability to medical image segmentation remains limited by severe domain shifts, the absence of privileged spatial prompts, and the need to reason over complex anatomical and volumetric structures. Here we present Medical SAM3, a foundation model for universal prompt-driven medical image segmentation, obtained by fully fine-tuning SAM3 on large-scale, heterogeneous 2D and 3D medical imaging datasets with paired segmentation masks and text prompts. Through a systematic analysis of vanilla SAM3, we observe that its performance degrades substantially on medical data, with its apparent competitiveness largely relying on strong geometric priors such as ground-truth-derived bounding boxes. These findings motivate full model adaptation beyond prompt engineering alone. By fine-tuning SAM3's model parameters on 33 datasets spanning 10 medical imaging modalities, Medical SAM3 acquires robust domain-specific representations while preserving prompt-driven flexibility. Extensive experiments across organs, imaging modalities, and dimensionalities demonstrate consistent and significant performance gains, particularly in challenging scenarios characterized by semantic ambiguity, complex morphology, and long-range 3D context. Our results establish Medical SAM3 as a universal, text-guided segmentation foundation model for medical imaging and highlight the importance of holistic model adaptation for achieving robust prompt-driven segmentation under severe domain shift. Code and model will be made available at https://github.com/AIM-Research-Lab/Medical-SAM3.

Community

Paper author Paper submitter

๐Ÿฅ Medical SAM3: Bridging the Gap in Text-Guided Medical Image Segmentation

Existing foundation models often face challenges when applying "segment anything" paradigms to medical imaging, particularly in the absence of spatial prompts (bounding boxes). Medical SAM3 aims to address this by enhancing the model's semantic understanding through full-parameter fine-tuning.

๐Ÿ’ก Key Contributions:

  • ๐Ÿ—จ๏ธ Reduced Reliance on Spatial Cues: The model is trained to perform segmentation using solely text prompts (e.g., "Polyp", "Tumor"), aiming for a more automated workflow.
  • ๐Ÿ“ˆ Improved Generalization: Experiments on 7 unseen external datasets suggest a significant performance improvement in zero-shot settings (Dice score: 11.9% vs 73.9%).
  • ๐Ÿฉป Diverse Training Data: Developed on a corpus of 33 datasets across 10 imaging modalities to capture a wide range of medical semantics.

We hope this work contributes to the development of more robust, prompt-driven medical AI assistants.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.10880 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.10880 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.