Contents

The problem: distributed, non-shareable healthcare data
How it works
The seminal medical paper
Open Source frameworks
Early large-scale medical cases
Benefits and challenges
GDPR and FL
In the Italian and European context
Outlook

AgenticHealth

On-premise clinical platform with local LLMs, RAG on FHIR/DICOM data, diagnostic support, remote follow-up. Architecture designed for the MDR pathway.

Discover AgenticHealth →

Digital Health

Medical software development compliant with CE and MDR regulatory standards. Clinical decision support systems, AI integration in clinical workflows.

Discover →

The problem: distributed, non-shareable healthcare data

Training medical AI models requires large volumes of data. But healthcare data are distributed across centres (hospitals, labs, departments) and hard to centralise for technical, organisational and regulatory reasons (GDPR art. 9, HIPAA, national care data rules). The paradox: medical AI needs multi-centre data; health systems cannot share them.

One answer to this paradox is Federated Learning (FL) — term coined by H. Brendan McMahan et al. (Google, 2016-2017) in the paper “Communication-Efficient Learning of Deep Networks from Decentralized Data” (FedAvg, 2017). The basic idea: instead of bringing data to the model, bring the model to the data.

How it works

The typical workflow:

A coordinating server distributes an initial model to N clients (each a healthcare centre with local data)
Each client runs local training for a number of epochs on its own dataset
Each client sends updated weights (not data) to the server
The server aggregates the weights (weighted average, typically FedAvg) to produce a new global model
The process repeats for R rounds until convergence

Individual data never leave the centre. The server sees only weight aggregates — which can be further protected with Secure Aggregation (cryptography) and Differential Privacy (added noise).

The seminal medical paper

Rieke, Hancox, Li et al. (2020), published in npj Digital Medicine as “The future of digital health with federated learning”, is the reference paper establishing FL as a credible paradigm for multi-centre clinical AI. Authors — Nicola Rieke (NVIDIA), Jonny Hancox (NVIDIA), Wenqi Li (NVIDIA) and collaborators from KCL, Penn, UCLA — articulate use cases and challenges.

Typical applications:

Brain tumour segmentation trained on MR volumes from 20 hospitals without sharing them
COVID-19 X-ray triage with models trained on multi-national datasets
Bone fracture detection trained across hundreds of hospitals
Clinical analytics on electronic records from multiple health systems

Open Source frameworks

The FL framework ecosystem as of September 2021:

Flower (flwr.org)

Developed by Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas Lane and others at Cambridge University and collaborators. Published as Open Source in 2020 (arXiv preprint “Flower: A Friendly Federated Learning Research Framework”, Beutel et al. 2020). Apache 2.0 licence. Features:

Framework-agnostic — supports PyTorch, TensorFlow, JAX, scikit-learn
Pluggable strategy — FedAvg by default, easily swapped with FedProx, FedOpt, custom
Client languages: Python, Android (Kotlin), iOS (Swift)
Scalability from simulations to hundreds of real clients

FATE (fate.fedai.org)

Developed by WeBank (China) from 2019. One of the first production-grade FL frameworks. Apache 2.0 licence. Features:

Oriented to industrial pipelines (FinTech, healthcare)
Supports vertical FL (same patient’s data distributed across institutions by type — genomic at one centre, clinical at another) beyond classic horizontal FL
Integrated secure computation
Complex to install but feature-rich

PySyft (openmined.org)

Developed by the OpenMined community, active from 2017. Apache 2.0 licence. Features:

Privacy focus: differential privacy, secure multi-party computation (SMPC), homomorphic encryption
Remote tensors — programming pattern that makes distributed tensors feel local
PySyft 0.x mature, v1.0+ in development

TensorFlow Federated (TFF)

Developed by Google from 2019. Apache 2.0 licence. Part of the TensorFlow ecosystem. Limit: tied to TF2 stack, less popular in medicine which predominantly uses PyTorch.

OpenFL (openfl.io)

Developed by Intel and released in March 2021. Apache 2.0 licence. Designed for enterprise contexts with SGX support. Used in the Federated Tumor Segmentation (FeTS) project at University of Pennsylvania — one of the first scale FL evaluations on BraTS data.

NVIDIA Clara Train FL

Part of the Clara stack (commercial but with Open Source components). Heavily used in healthcare for NVIDIA-based deployment.

Early large-scale medical cases

As of 2021, noteworthy healthcare FL projects:

EXAM (EMR CXR AI Model) — NVIDIA project with 20 hospitals across 5 continents, February-April 2021, for COVID-19 severity and mortality prediction from chest X-ray + record. Published in Nature Medicine 2021, shows FL produces more generalisable models than single centres
Federated Tumor Segmentation (FeTS) Challenge 2021 — first federated brain tumour segmentation challenge, using OpenFL infrastructure, coordinated by University of Pennsylvania (Spyridon Bakas)
MELLODDY — EU IMI consortium, federated drug discovery, 2019-2022, 10 pharmaceutical companies without sharing proprietary data
ACR AI-LAB — American College of Radiology FL platform for diagnostic imaging

Benefits and challenges

Healthcare benefits

GDPR/HIPAA compliance: data do not leave the centre
Access to larger data: models trained on 10-100x more data than any single centre could access
Generalisation: models that work across different populations
Lower dataset-specific bias: multi-site diversity improves robustness

Technical challenges

Non-IID data: data across hospitals are not independent and identically distributed (different scanners, protocols, populations); FedAvg convergence can degrade
Federated optimisers: FedProx, FedOpt, SCAFFOLD, FedNova address the non-IID issue
Communication overhead: transferring large model weights between clients and server costs bandwidth
Privacy attacks: gradient inversion can reconstruct training data; DP and SecAgg are countermeasures with accuracy trade-offs
Byzantine security: malicious clients can poison training
Client heterogeneity: clients with different computational resources (large vs. small hospital)

Organisational challenges

Multi-party governance: who owns the aggregate model, who decides the strategy, who publishes
Preliminary data harmonisation: data must be homogeneous in format (FHIR, standardised DICOM, common concept ontologies)
Coordination: starting an FL project requires legal, ethical, technical alignment across dozens of partners
Certification: an FL-produced model by N hospitals as medical device has distributed regulatory responsibilities

The FL-GDPR relationship is nuanced. FL is aligned with GDPR principles (minimisation, privacy by design) but is not a technology that automatically eliminates obligations:

Centres’ local data remain subject to GDPR
Aggregated weights might contain residual information about data (per some known attacks); their transfer requires assessment
DPIA (Data Protection Impact Assessment) is always needed
Controller-to-controller agreements (DPA) are needed for the cooperative model

The European Data Protection Board and the Italian Garante are producing FL guidelines — not yet definitive as of 2021.

In the Italian and European context

As of 2021 Italian participation in FL projects is still experimental:

Some IRCCS participate in EU IMI consortia with FL components
Italian universities and polytechnics run research on FL protocols
EU projects — Horizon 2020 and Horizon Europe include calls on federated health data analytics
The EHDS topic, under discussion at EU level, hints at FL as a possible compliant secondary-use mechanism

Outlook

Expected directions in the coming years:

MONAI Federated — the MONAI consortium is discussing FL integration in a dedicated sub-project
Regulators embracing FL — FDA already has preliminary guidance on FL use for AI/ML models; EU MDR will need to articulate the topic
Aggregation protocol standardisation — towards interoperability between FL frameworks
Cross-device vs. cross-silo — healthcare is typically cross-silo (few clients, lots of data each); many recent developments focus here
Fairness and bias auditing in FL — how to ensure FL models do not carry single-site biases
Synthetic data sharing as complement — generating synthetic data similar to real as alternative/complement to FL

FL in 2021 is in the “early adoption production” phase in healthcare: first real deployments produce measurable results, Open Source tools are mature, but systemic adoption still requires years of organisational, regulatory, cultural infrastructure. The trajectory is solid and the topic will be central in the European healthcare data debate of coming years.

References: McMahan et al., “Communication-Efficient Learning of Deep Networks from Decentralized Data” (2017). Rieke et al., “The future of digital health with federated learning”, npj Digital Medicine (2020). Flower (flower.dev). FATE (fate.fedai.org), WeBank. PySyft (OpenMined). TensorFlow Federated (Google). OpenFL (Intel, 2021). EXAM Study, Nature Medicine 2021. MELLODDY, IMI.

Company

Actions

Links

Products

Solutions

Industries

Federated learning in healthcare: Flower, FATE, OpenFL

AgenticHealth

Digital Health

The problem: distributed, non-shareable healthcare data

How it works

The seminal medical paper

Open Source frameworks

Flower (flwr.org)

FATE (fate.fedai.org)

PySyft (openmined.org)

TensorFlow Federated (TFF)

OpenFL (openfl.io)

NVIDIA Clara Train FL

Early large-scale medical cases

Benefits and challenges

Healthcare benefits

Technical challenges

Organisational challenges

In the Italian and European context

Outlook

Federated learning in healthcare: Flower, FATE, OpenFL

AgenticHealth

Digital Health

The problem: distributed, non-shareable healthcare data

How it works

The seminal medical paper

Open Source frameworks

Flower (flwr.org)

FATE (fate.fedai.org)

PySyft (openmined.org)

TensorFlow Federated (TFF)

OpenFL (openfl.io)

NVIDIA Clara Train FL

Early large-scale medical cases

Benefits and challenges

Healthcare benefits

Technical challenges

Organisational challenges

GDPR and FL

In the Italian and European context

Outlook