Segment Anything and MedSAM: foundation models for medical segmentation

Meta AI's Segment Anything (SAM, 2023) and its medical adaptation MedSAM (University of Toronto, Nature Communications 2024). Prompting, zero-shot generalisation, and the new SAM 2 for video/volumetric data.

Digital HealthR&DOpen SourceAI SAMMedSAMSegment AnythingMetaFoundation ModelImagingTorontoOpen SourceDigital HealthAI

The foundation model paradigm for segmentation

The deep learning → U-Net → nnU-Net sequence has made trainable segmentation reachable for any group with a few-hundred-image dataset. The next step — foundation model — proposes a model pre-trained at enormous scale that adapts to new tasks without fine-tuning through prompting: the user provides a hint (a click, a box, text) and the model produces the requested segmentation.

The reference work is Segment Anything Model (SAM) — published by Meta AI in April 2023 with the paper “Segment Anything” by Alexander Kirillov, Eric Mintun, Nikhila Ravi et al. Medical adaptation arrived quickly with MedSAM by Jun Ma and collaborators (University of Toronto / Vector Institute), published in Nature Communications in January 2024.

SAM (Segment Anything)

SAM is a segmentation foundation model with the following technical characteristics:

  • Architecture: large image encoder ViT (Vision Transformer) + prompt encoder + lightweight mask decoder
  • Training dataset: SA-1B (Segment Anything 1 Billion) — 1.1 billion masks over 11 million natural images, collected by Meta with a semi-automatic approach (AI-assisted annotation cycle with humans-in-the-loop)
  • Prompting: the user provides one or more prompts to specify what to segment:
    • Points (foreground/background)
    • Bounding box
    • Coarse mask
    • Text (limited support)
  • Zero-shot generalisation: SAM can segment object classes never seen in training
  • Apache 2.0 licence; model weights (ViT-B 375MB, ViT-L 1.2GB, ViT-H 2.5GB) are public

Architecture is designed for separate encoder/decoder inference: the image is processed once by the heavy encoder; each subsequent prompt interaction is fast, handled by the lightweight decoder. This makes SAM usable in real-time interactivity — a crucial point for human annotation applications.

SAM’s limit on medical data

Despite its generality, SAM has suboptimal performance on many medical tasks:

  • Training on natural images (object, landscape, people photos) doesn’t cover well the variability of medical modalities (CT, MR, US, endoscopy, histology)
  • Medical structures of interest (tumours, lesions) often have ill-defined borders or complex contrast — cases where SAM gives overly generic prompts
  • Some specific medical visual patterns (homogeneous soft tissue, scan artefacts) are not in SAM’s prior

As shown in multiple independent evaluations on standard medical datasets, pure SAM produces mediocre or unreliable results for tumour lesions, complex anatomical structures, histological microstructure.

MedSAM

MedSAM solves the problem with a pragmatic approach: fine-tuning SAM on scale medical data. Jun Ma’s team collected:

  • Over 1.5 million medical image-mask pairs
  • From public datasets spanning multiple modalities (CT, MR, US, endoscopy, dermatology, X-ray, histology, OCT)
  • Coverage of bones, organs, lesions, pathological structures

The model was fine-tuned from public SAM weights, with prompting primarily based on bounding box (more stable than point clicks for medical applications). The result, documented in Nature Communications January 2024, is a model that:

  • Preserves SAM’s prompt-based generality
  • Surpasses original SAM on medical tasks by an order of magnitude in accuracy
  • Is comparable to or better than task-specific models on many benchmarks
  • Is distributed under Apache 2.0 with public weights

The MedSAM paper also contributes on effective prompting strategies in medicine, efficient training (few days on single GPU), cross-modality evaluation.

Variants and optimisations

From 2023 to 2024 a family of derivatives and specialisations emerged:

  • MedSAM-Lite — compact version optimised for fast inference on modest GPUs
  • SAMed (Wu et al. 2023) — LoRA-based (Low-Rank Adaptation) fine-tuning
  • AutoSAM — automatic prompt generation from images
  • SegVol (2024) — extension to 3D volumes with 3D prompting
  • Medical-SAM-Adapter — modular adaptation for specific modalities
  • nnSAM — combination with nnU-Net in hybrid pipelines

SAM 2

On 30 July 2024 Meta released SAM 2, a major extension of the original model:

  • Native video support — object prompts in single frames propagate to subsequent frames
  • Improved performance on static images
  • Apache 2.0 licence preserved
  • Medical relevance: SAM 2 naturally applies to temporal data (dynamic ultrasound, fluoroscopy, endoscopic video, cine cardiac MR) and to 3D volumetric data treated as spatial “video”
  • The community is working on MedSAM-2 — SAM 2 fine-tuned on medical data, expected in the months following SAM 2 release

Integration with clinical workflows

SAM/MedSAM integration into open source imaging tools is rapidly developing:

  • MONAI Label added SAM/MedSAM backbone support for DeepGrow and DeepEdit — a user clicks, the SAM-based model produces initial segmentation
  • 3D Slicer — experimental extensions enable SAM prompting on MR/CT volumes
  • OHIF + Cornerstone3D — integration via MONAI Label server
  • QuPath — SAM plug-in for digital pathology (cell, gland, tissue prompting)
  • napari (Python environment for microscopy) — SAM plugins for biological annotation
  • ITK-SNAP, other segmentation editors — SAM-augmented variants

The recurring pattern: the clinician/researcher provides minimal prompts (a box or 2-3 clicks) instead of drawing manually; the model produces the rest; the clinician corrects if needed.

Applicative prospects

Emerging patterns:

Dataset annotation

Most immediate use case: accelerating annotated dataset creation. An annotator previously taking 30 minutes per volume now takes 5-10. Linear productivity impact.

Interactive clinical segmentation

In radiology/radiotherapy workflows where manual segmentation is part of the clinical process (RT planning, specific measurements), SAM/MedSAM reduce time and variability.

Conversational user interfaces

Combination with LLM → “segment the right lung tumour” → LLM translates to SAM prompts → MedSAM segments. The clinician interacts in natural language.

Zero-shot on rare structures

For structures or pathologies with insufficient training data, SAM/MedSAM enable useful segmentations without dedicated training.

Combined pipelines

SAM/MedSAM for rapid bounding box generation + nnU-Net/TotalSegmentator for refinement. Combination of generality and specialisation.

2024 limits

  • Not perfectly accurate on hard structures — SAM/MedSAM remains a productivity tool, not a clinical evaluation substitute
  • Dependence on prompt quality — an ambiguous prompt leads to ambiguous results
  • SAM encoder computational cost — while post-encoding inference is fast, initial encoding needs a powerful GPU
  • Limited cross-modality generalisation in smaller versions — “lite” versions may have degraded performance on off-training modalities
  • Regulation — as with all open source AI tools, use in certified clinical products requires usual qualification (IEC 62304, MDR, CE marking)

SAM and EU AI Act

An emerging 2024 theme is the EU AI Act (Regulation (EU) 2024/1689, published July 2024, in force 1 August 2024). The Regulation classifies medical device AI as high-risk systems (art. 6(1) + Annex I that includes MDR). A SAM/MedSAM-based product in clinical context is doubly regulated: MDR + AI Act. Combined obligations include AI-specific risk management, transparency, documentation, human oversight, production monitoring.

The Apache 2.0 licence of SAM/MedSAM does not hinder use in certified products, but the final producer remains responsible for full qualification.

In the Italian context

As of 2024 Italian medical research groups begin experimenting with SAM/MedSAM:

  • Politecnico di Milano, Turin, Bologna, Verona — in rapid annotation pipelines for research projects
  • IRCCS — acceleration of radiomics projects on large cohorts
  • Healthcare organisations — some experimental radiotherapy implementations

Clinical production adoption will take time for regulatory qualification; but impact on dataset preparation and research is already significant.

Outlook

Expected directions in the coming months/years:

  • MedSAM-2 based on SAM 2 with native volumetric support
  • Clinical multimodal models — SAM + biomedical LLM combination for language+image interaction
  • Modality specialisations — MedSAM-MR, MedSAM-US, MedSAM-Path with single-modality fine-tuning
  • Local on-premise fine-tuning — platforms enabling a hospital to specialise MedSAM on its own data
  • Rigorous clinical evaluation — prospective in-department studies measuring impact on clinical time, accuracy, professional satisfaction
  • EHDS/secondary-use integration — MedSAM use to generate segmentations on shared healthcare datasets in HDAB secure environments

SAM and MedSAM in 2024 represent the new paradigm of medical AI: not models pre-trained on specific tasks, but general models prompt-adapted to specific cases. A conceptual shift as relevant as the move from classical methods to deep learning in 2012-2015.


References: Alexander Kirillov, Eric Mintun, Nikhila Ravi et al. (Meta AI), “Segment Anything”, April 2023. Jun Ma et al. (University of Toronto / Vector Institute), “Segment anything in medical images”, Nature Communications, January 2024. SAM 2 (Meta AI, July 2024). Apache 2.0 licences. Integration with MONAI Label, 3D Slicer, QuPath, napari. Regulation (EU) 2024/1689 (EU AI Act).

Need support? Under attack? Service Status
Need support? Under attack? Service Status