OMOP Common Data Model: OHDSI, EHDEN and European real-world evidence

The OMOP observational data model, the OHDSI ecosystem (ATLAS, HADES, Broadsea) and the IMI EHDEN project bringing the model to European healthcare databases. Differences from FHIR and the role in secondary use.

Digital HealthR&DOpen Source OMOPOHDSIEHDENReal-World EvidenceCDMSecondary UseIMIDigital Health

A data model for observational research

Exchange standards — HL7 v2, CDA R2, FHIR — carry clinical data between systems in real time or near real time. For observational analysis at population scale — epidemiological studies, real-world drug safety and efficacy assessments, outcome research — something else is needed: a structured data model, oriented to analysis, where clinical databases heterogeneous in origin and format are harmonised against shared rules.

This is the role of the OMOP Common Data ModelObservational Medical Outcomes Partnership. Born in 2008 within the Foundation for the National Institutes of Health (FNIH) as a partnership between FDA, pharmaceutical industry and academia, the CDM moved in 2014 to the international OHDSI community — Observational Health Data Sciences and Informatics — which has maintained and evolved it since then. As of 2019 the reference version is OMOP CDM v5.3, with over one hundred databases worldwide declared conformant — hospital studies, disease registries, insurance claims, general practice databases.

CDM structure

OMOP CDM is a set of relational tables with well-defined population rules. The main event tables:

  • PERSON — the individual with demographic attributes (year/month/day of birth, sex, race, ethnicity) and pseudonymised identifier
  • OBSERVATION_PERIOD — intervals during which a person was observable in the system (equivalent to insurance enrollment, or presence in an HIS catchment)
  • VISIT_OCCURRENCE — admissions, outpatient visits, ED accesses
  • CONDITION_OCCURRENCE — diagnoses and problems, coded with standard concept_ids (typically SNOMED CT)
  • DRUG_EXPOSURE — drug exposures (prescriptions, administrations), RxNorm standard
  • PROCEDURE_OCCURRENCE — procedures performed, SNOMED CT + HCPCS/CPT4 standard
  • MEASUREMENT — numeric laboratory, vital signs, anthropometric values, LOINC standard
  • OBSERVATION — other non-numeric clinical observations (family history, habits, qualitative results)
  • DEATH — date and cause of death
  • DEVICE_EXPOSURE, NOTE, SPECIMEN, FACT_RELATIONSHIP

Demographic and administrative tables: LOCATION, CARE_SITE, PROVIDER, PAYER_PLAN_PERIOD, COST.

Alongside sit the Standardized Vocabularies (CONCEPT, CONCEPT_RELATIONSHIP, CONCEPT_ANCESTOR) distributed by Athena, harmonising SNOMED CT, LOINC, RxNorm, ICD, ATC and dozens of other vocabularies into a single concept_id space.

ETL: from local data to CDM

Bringing a clinical database into OMOP requires a structured ETL process:

  • ETL specification — document describing sources, transformations, assumptions. Typically produced in close cooperation between the centre’s data managers and clinicians
  • Terminology mapping — from local codes (free-text diagnoses, ward codes, internal lab codes) to standard concept_ids; the most time-intensive part, supported by Usagi
  • Data conversion — SQL/Python scripts populating the OMOP tables from the sources. OHDSI ETL scripts are typically open-sourced by the centres that produce them
  • Quality checks — referential integrity, consistent statistical distributions, valid concept_ids
  • Quality reports — run with Achilles, which produces hundreds of ready-to-inspect indicators

An ETL for a medium-sized database is a matter of several person-months. The documentation of the specific ETL is itself a valuable output: it lets whoever queries the dataset understand what is there and what is missing.

The tooling ecosystem

OHDSI distributes a complete open source stack for analysis over OMOP:

  • ATLAS — web application for cohort definition, population characterisation, execution of standard analyses. A graphical interface that generates SQL executable against the CDM, usable without programming
  • WebAPI — the REST backend of ATLAS, exposing services on an OMOP CDR
  • HADESHealth Analytics Data-to-Evidence Suite, a set of R packages for advanced analysis: CohortMethod for treatment comparisons with propensity matching, PatientLevelPrediction for predictive models, SelfControlledCaseSeries, SelfControlledCohort, IncidenceRate, Eunomia (teaching dataset)
  • Achilles — R package for data quality assessment on a CDM, producing a browsable report with hundreds of metrics
  • Broadsea — Docker stack bundling ATLAS, WebAPI, HADES, PostgreSQL and related tools in a ready-to-use setup for development and teaching

OHDSI studies are typically reproducible and federated: a researcher writes a protocol and R scripts; each partner centre runs them on its own CDM; aggregate statistics — never individual data — are collected and combined. Patient-level data never leave the origin centre.

Federated studies

The federated aspect is the value of OHDSI: it allows answering observational questions on hundreds of millions of records without concentrating data. Multi-centre studies published in recent years have involved dozens of databases with over 500 million patients in total, producing evidence that single datasets could not sustain:

  • Post-marketing drug safety assessments
  • Analyses of prescribing patterns across very heterogeneous populations
  • Comparative effectiveness studies among alternative treatment strategies
  • Rare-disease clinical patterns on aggregated populations

The federated model is explicitly compatible with GDPR minimisation requirements and with European healthcare legislation, since individual data remain local and only aggregates can leave.

EHDEN: Europe on OMOP

In November 2018, the Innovative Medicines Initiative — public-private partnership between the European Commission and EFPIA (European Federation of Pharmaceutical Industries and Associations) — launched the EHDEN project (European Health Data & Evidence Network): a five-year programme with a budget of around EUR 34 million, with the stated goal of bringing 100+ European healthcare databases onto OMOP CDM, building a federated network for real-world evidence generation.

Technical coordination is entrusted to Erasmus MC (Rotterdam) with Peter Rijnbeek; industrial coordination to Janssen Pharmaceutica. Project partners include universities, research centres and hospitals across several European countries.

Programme components:

  • EHDEN Academy — free online training platform on OMOP CDM, ETL, Standardized Vocabularies, OHDSI tools
  • SME certification — a qualification for service companies able to support healthcare centres in the ETL process towards OMOP, with an exam process based on the Academy
  • Calls for Data Partners — competitive calls funding healthcare centres for conversion of their databases to OMOP CDM, accompanied by certified SMEs
  • Data portal — federated catalogue of partner databases, with standardised descriptors

One year after launch, EHDEN has completed the first waves of data-partner calls, with around forty European centres already in onboarding. The trajectory — 100+ databases by 2024 — is credible and rides on a growing demand for real-world evidence from EMA and national agencies.

OMOP vs other Common Data Models

OMOP is not the only CDM in use internationally:

  • Sentinel CDM — model of the FDA used in the Sentinel System for post-market drug safety surveillance. Conceptually similar to OMOP, structurally distinct
  • PCORnet CDM — model of the Patient-Centered Outcomes Research Network in the US, oriented to comparative research
  • i2b2Informatics for Integrating Biology and the Bedside, Harvard Partners HealthCare; star-schema with a central observation_fact, highly flexible. Widely adopted in European academic settings
  • OpenEHR — not strictly a CDM but a native persistence model; some projects map openEHR to OMOP for analytics

Interoperability between CDMs is an active research topic. The OMOP community provides documented mappers to/from Sentinel and PCORnet; the i2b2 ecosystem includes extensions that emit OMOP from i2b2 observations.

OMOP vs FHIR

The distinction between OMOP and FHIR is often misunderstood, since both deal with structured healthcare data. But they address orthogonal needs:

  • FHIR is an exchange standard — how to carry an Observation from system A to system B via REST in an interoperable way. It is oriented to transactions, consultations, application integrations
  • OMOP is an analytical data model — how to structure millions of heterogeneous observational records into relational tables for longitudinal statistical analyses

In practice, FHIR and OMOP integrate: an OMOP CDR can expose its resources via FHIR API for consultation; a FHIR stream can be ETL’d into OMOP for analysis. Tools like FHIR-to-OMOP and OMOP-on-FHIR (Georgia Tech) implement this translation.

In Italy

As of 2019 OMOP CDM adoption in Italy is limited to specific cases: oncology research centres, drug utilisation studies, AIFA projects on prescription databases. No Italian hospital or FSE system has yet a productive OMOP CDM. EHDEN calls are attracting interest and some Italian centres are in the evaluation phase for onboarding.

The trajectory — driven by EMA, by upcoming regulatory changes on secondary use in Europe, by EHDEN expansion — suggests OMOP will occupy an increasingly important space in transforming Italian healthcare data into clinical and regulatory evidence.


References: OHDSI — OMOP Common Data Model v5.3 (ohdsi.org). OHDSI Tools: ATLAS, WebAPI, HADES, Achilles, Broadsea. OMOP Standardized Vocabularies (Athena). IMI EHDEN — European Health Data & Evidence Network (ehden.eu), launched November 2018. Erasmus MC (coordinator). EHDEN Academy.

Need support? Under attack? Service Status
Need support? Under attack? Service Status