AIHealth
On-premise clinical platform with local LLMs, RAG on FHIR/DICOM data, diagnostic support, remote follow-up. Architecture designed for the MDR pathway.
Discover AIHealth →
Digital Health
Medical software development compliant with CE and MDR regulatory standards. Clinical decision support systems, AI integration in clinical workflows.
Discover →Segmenting is different from classifying
The first medical deep learning applications — documented in 2013-2015 literature — focused mainly on classification: given an image (a mammogram, a retina, a histology slide), return a categorical prediction (benign/malignant, retinopathy/normal, tumour subtype). The models used were typically AlexNet, GoogLeNet, VGG, originally trained on ImageNet and fine-tuned with transfer learning.
Another task — central in medical imaging and harder — is segmentation: given an image, produce a pixel-wise map assigning each pixel a class (tumour/non-tumour, organ/background, tissue type). Segmentation underlies many clinical measurements: tumour volumetry, radiotherapy planning, lesion analysis, anatomical structure quantification.
CNN architectures for classification do not adapt directly to segmentation: aggressive pooling loses the fine spatial information that is exactly what is needed for pixel-level output. Early attempts used patch-based approaches — sliding a window over the image, classifying the central pixel — slow and with boundary discontinuities.
In May 2015, a group at the University of Freiburg led by Thomas Brox published an elegant solution: U-Net.
The architecture
U-Net — described in the paper “U-Net: Convolutional Networks for Biomedical Image Segmentation” by Olaf Ronneberger, Philipp Fischer and Thomas Brox, presented at MICCAI 2015 (May 2015 ArXiv preprint) — is a fully convolutional network with a characteristic U shape.
The structure:
- Contracting path (left of the U) — classic chain conv → ReLU → conv → ReLU → max pool progressively reducing spatial resolution and increasing channels. Extracts hierarchical features of increasing context
- Expanding path (right of the U) — symmetric chain upconv → conv → ReLU → conv → ReLU rebuilding spatial resolution while reducing channels. Produces the segmentation map
- Skip connections — feature maps of the contracting path are concatenated with the corresponding feature maps in the expanding path. They carry high-resolution information directly to output layers, preserving fine spatial detail
The result is an architecture that simultaneously sees the global context (through the contracting path) and preserves local detail (through the skip connections). Empirically, it excels on medical segmentation, where both aspects are critical.
Original results
The original paper documents two experiments:
- ISBI 2012 EM segmentation challenge — neuron segmentation in electron microscopy. U-Net wins
- ISBI 2015 cell tracking challenge — cell tracking in time-lapse light microscopy. U-Net wins both categories (DIC-HeLa, PhC-U373)
Beyond winning results, the paper notes that U-Net works with few training data: on the cell tracking challenge, performance is achieved with datasets of a few dozen annotated images, thanks to heavy use of elastic data augmentation (elastic deformations of training images).
For the medical world, the combination of few annotations (common reality) and superior performance is revolutionary.
The original implementation
The original publication is accompanied by public Caffe code (University of Freiburg gitlab), with trained models, training scripts, documentation. Immediate code availability — under GNU GPL — is crucial: every research group can reproduce results and adapt the architecture to their case.
2015-2017 diffusion
In the 18 months after publication, U-Net spreads rapidly:
- Cross-framework portability — the architecture’s simplicity enables quick re-implementations in TensorFlow (released November 2015), Keras, PyTorch (released January 2017). GitHub by 2017 counts dozens of public U-Net implementations
- Adoption in MICCAI challenges — between 2015 and 2017, U-Net or variants win or rank top in many segmentation challenges (brain tumours BRATS, multiple sclerosis lesions, lung, prostate, colon, eye, histology)
- Citations — the U-Net paper is among the most cited in recent medical imaging (over 2000 citations by 2017)
Extensions: 3D U-Net
Many medical images are three-dimensional — CT, MR are volumes, not single images. Original U-Net is 2D; applied slice-by-slice it ignores inter-slice context. A natural extension is moving to 3D convolutions.
Özgün Çiçek et al. (2016), same Freiburg group, publish 3D U-Net: “3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation”, MICCAI 2016. The network uses 3D convolutions throughout the contracting/expanding path, treating the volume as 3D input. A key contribution: the ability to train with sparse annotations (single slices annotated per volume) rather than complete annotations — dramatically cutting clinical annotation costs.
V-Net
Another volumetric extension is V-Net — “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation” by Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi, presented at 3DV 2016. V-Net introduces:
- Residual connections in blocks (inspired by He et al.’s 2015 ResNet)
- Dice loss function — instead of cross-entropy, uses the Dice coefficient directly as loss. Very effective for tasks with strong class imbalance (typical in medicine: tumour often 1-5% of voxels)
V-Net is demonstrated on MR prostate segmentation, a typically hard task. It becomes a parallel reference to 3D U-Net for volumetric tasks.
Later variants
Between 2016 and early 2017 numerous architectural variants emerge:
- Attention U-Net — Oktay et al., gating attention to focus on relevant regions
- Dense U-Net — DenseNet blocks in place of standard convolutional blocks
- Residual U-Net — residual connection integration
- U-Net with pre-trained encoder — replacing the contracting path with an ImageNet-pretrained encoder (VGG, ResNet), for transfer learning
Each variant offers different trade-offs among accuracy, parameter count, GPU memory requirements.
U-Net as baseline
In 2016-2017 imaging challenges U-Net has become the default baseline. Published methods typically are:
- U-Net architecture (2D or 3D) as starting point
- Task-specific modifications — loss function, augmentation, pre/post-processing
- Ensemble of U-Net networks trained on different folds
The practical advantage of U-Net as baseline is reproducibility: comparing methodological components is simpler with a common base.
Clinical applications
By 2017 U-Net has generated applications in virtually every medical imaging domain:
- Oncology — segmentation of brain tumours (BRATS), lung, liver, breast
- Cardiology — cardiac cavity segmentation (LV, RV), myocardium, ventricles
- Neurology — multiple sclerosis lesions, microbleeds, cortical atrophy
- Urology — prostate, kidney
- Histopathology — nuclei, glands, vessels
- Ophthalmology — retinal vessels, optic disc
- Dermatology — skin lesions
- Radiotherapy — organs at risk for planning
- Surgery — pre-operative planning, patient-specific 3D models
Comparison with pre-deep learning methods is typically in U-Net’s favour by 5-15 Dice points, with moderate-size datasets (~100-1000 annotated images).
2017 limits
Some recognised limits:
- GPU memory — 3D U-Net on full high-resolution volumes needs GPUs with lots of memory (16-24 GB); alternatives are patch-based training, with context loss
- Cross-site generalisation — a network trained on one hospital’s data often degrades on another’s (different scanner characteristics, protocols, populations)
- Clinical annotation — still requires radiologist-quality manual segmentations
- Imbalanced class — tumour is a small fraction of volume; requires specific loss functions (Dice loss, focal loss)
- Interpretability — why does the network predict a given segmentation? An open topic, important for clinical adoption
Open source ecosystem
As of 2017, the open source ecosystem around U-Net includes:
- Original Freiburg implementation (Caffe, GPL) — accessible and maintained
- Dozens of implementations in PyTorch/TensorFlow/Keras on GitHub
- Public datasets — BRATS (brain tumours), LUNA (lung nodules), LiTS (liver lesions), ACDC (cardiac), SegTHOR, and many more
- Annual MICCAI challenges — comparison benchmarks on standard datasets
- Medical dataset publications — more and more papers release code and pre-trained models
The reproducible pattern (dataset + code + model) has become almost a condition of publication in the more rigorous medical imaging venues.
What U-Net means for the clinic
U-Net’s impact — and more generally of the 2015-2017 deep learning generation — on clinical practice is still gestating. Technology is mature for research; routine adoption in departments requires:
- PACS/RIS integration — pipelines that bring U-Net to radiology workstations
- Medical device certification — FDA 510(k), CE marking, work becoming more structured in coming years
- Multi-site clinical validation — demonstration of performance on different populations
- Workflow integration — AI must speed up the clinician, not replace them; interaction design is crucial
In 2017 some commercial vendors (Siemens, Philips, GE) start embedding deep learning components in certified products; specialised startups (Arterys, Zebra Medical, Enlitic, Aidence, Viz.ai) emerge with specific offerings.
Outlook
In the coming years expected:
- Larger, more accurate architectures — continuous architectural improvements
- Multi-task foundation models — single networks doing multiple tasks simultaneously (segmentation + classification + detection)
- Self-supervised pre-training on large medical datasets — reduced dependence on annotations
- Production-ready frameworks — expected release of medical-specific libraries (an interesting announced project is the NVIDIA+KCL consortium for a PyTorch-based medical imaging framework; provisional naming under discussion)
- Regulatory progress — FDA is beginning to articulate a framework for AI as a Medical Device
U-Net has become, in less than two years, the reference architecture for biomedical segmentation — a paradigmatic example of how sharing a paper + open source code can rapidly transform an entire applied research field.
References: Olaf Ronneberger, Philipp Fischer, Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, MICCAI 2015 (ArXiv 1505.04597). University of Freiburg. Çiçek et al., “3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation”, MICCAI 2016. Milletari, Navab, Ahmadi, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation”, 3DV 2016. Original code: lmb.informatik.uni-freiburg.de.