AIHealth
On-premise clinical platform with local LLMs, RAG on FHIR/DICOM data, diagnostic support, remote follow-up. Architecture designed for the MDR pathway.
Discover AIHealth →
Digital Health
Medical software development compliant with CE and MDR regulatory standards. Clinical decision support systems, AI integration in clinical workflows.
Discover →The bridge between terminologies
The major clinical terminologies — SNOMED CT, LOINC, RxNorm, ICD, ATC, MedDRA, HCPCS, NDC, CVX — have independent governance, licences, updates and semantics. Whoever wants to analyse healthcare data across sources — a hospital on ICD-10-CM, a research database on SNOMED CT, a pharmaceutical registry on ATC, a US clinical system on RxNorm — faces the problem of harmonising heterogeneous terminologies into a common view.
The OHDSI community — Observational Health Data Sciences and Informatics, an international collaborative initiative that since 2014 maintains the OMOP Common Data Model born in 2008 under the Observational Medical Outcomes Partnership at the FDA — has built a pragmatic answer to this problem: the OMOP Standardized Vocabularies.
A single Concept table
The heart of the Standardized Vocabularies is a single relational table: CONCEPT, which in the current release contains over five million concepts from dozens of terminologies. Each row represents a concept with:
- concept_id — global integer identifier within the whole OHDSI universe
- concept_name — preferred name in English
- domain_id — domain (Condition, Drug, Procedure, Measurement, Observation, Device, Visit)
- vocabulary_id — source terminology (
SNOMED,LOINC,RxNorm,ICD10CM,ICD9CM,ATC,HCPCS,MedDRA,CVX, …) - concept_class_id — class within the vocabulary (e.g.
Clinical Findingfor SNOMED CT) - concept_code — original code from the source terminology (SCTID, LOINC code, ICD code, …)
- standard_concept — flag indicating whether the concept is Standard (
S), Classification (C) or non-standard (NULL) - valid_start_date / valid_end_date — temporal validity of the concept
Two complementary tables complete the model:
- CONCEPT_RELATIONSHIP — relationships between concepts within the same terminology or across terminologies (Maps to, Is a, Has RxNorm, Has ATC)
- CONCEPT_ANCESTOR — transitive closure of IS-A hierarchical relationships, for queries over subclasses
The precomputed transitive closure is a central optimisation: querying “all diabetics” means joining the concept_ids of the descendants of a root concept, with a plain JOIN on CONCEPT_ANCESTOR.
Source and standard
The operational mechanism of the OMOP vocabulary revolves around a fundamental distinction:
- A source concept is a code as it appears in the origin system — for example an Italian ICD-10-CM code, a local laboratory LOINC, an internal code from a LIS
- A standard concept is the canonical concept chosen by OHDSI to represent the same meaning within the OMOP ecosystem, typically drawn from a “target” terminology for each domain
The standard choices per domain — visible in OHDSI’s design roles — follow a pragmatic logic:
- Condition — SNOMED CT as standard, with mappings from ICD-9-CM, ICD-10-CM, ICD-10, Read, MedDRA
- Drug — RxNorm as standard, with mappings from NDC, ATC, brand names
- Measurement — LOINC as standard
- Procedure — SNOMED CT + HCPCS/CPT4 + ICD-10-PCS
- Device — SNOMED CT
The practical outcome: every source code is associated via CONCEPT_RELATIONSHIP with a Maps to towards the corresponding standard concept. A lab receiving data from heterogeneous systems loads them on source concepts, and OMOP automatically projects them onto standard concepts for analysis.
Athena
The vocabulary is distributed through Athena — athena.ohdsi.org — OHDSI’s open portal. Whoever implements an OMOP CDR downloads from Athena the CSVs of the tables (CONCEPT, CONCEPT_RELATIONSHIP, CONCEPT_ANCESTOR, CONCEPT_SYNONYM, VOCABULARY, DOMAIN, CONCEPT_CLASS, RELATIONSHIP, DRUG_STRENGTH) and imports them into the database.
Athena is also a web browser: the user can search “hypertension”, “metformin”, “serum sodium” and see the standard reference concept, the mappings from various terminologies, and the hierarchy of descendants.
Licence restrictions reflect in the download: SNOMED CT is distributed only to authenticated users from SNOMED International member countries (or holding a valid individual licence), CPT-4 requires an AMA licence, ICD-10-CM and RxNorm are freely distributed. Import into a production CDR must account for these constraints — for Italy, whose formal status in SNOMED International as of 2017 does not include national membership, access to SNOMED CT inside OMOP must be assessed case by case.
Usagi: mapping local codes
Any real clinical record contains local codes — typed diagnoses, department codes, internal test codes — not present in the Standardized Vocabularies. Mapping local codes to OMOP concepts is one of the recurring tasks of whoever loads data into an OMOP CDR.
The dedicated tool is Usagi — an open source Java application distributed by OHDSI — which:
- Imports a CSV of local codes with descriptions
- For each one proposes candidate matches in the Standardized Vocabularies, ranked by a text similarity metric
- Allows the mapper (ideally a clinician or a medically trained data manager) to accept, modify or reject the match
- Produces a mapping file importable as source concept in the OMOP database
A well-done mapping is the most time-intensive part of an OMOP CDR implementation, but it is also the asset that makes data analysable in the shared queries of the OHDSI community.
Relationship with FHIR
The Standardized Vocabularies model and the FHIR terminology resource model (CodeSystem, ValueSet, ConceptMap) address different needs:
- FHIR exposes each CodeSystem in its native format, with its canonical identifier, and leaves to the consumer the job of combining multiple CodeSystems. A
ValueSetcan include codes from several CodeSystems, but their semantic integration is left to whoever builds the ValueSet - OMOP prescribes an explicit choice of standard terminology per domain and precomputes cross-terminology relationships. The price is rigid structuring; the benefit is immediacy of cross-source analysis
The two models are not alternatives, and in practice they coexist: FHIR is the exchange format, OMOP is the observational data model. An increasingly common workflow is: acquisition via FHIR API, ETL transformation to OMOP CDM, analysis with OHDSI Tools.
In the Italian landscape
OMOP CDM adoption in Italy as of 2017 is still limited to research environments and a few structured projects on administrative data. Mapping the Italian national terminologies — Italian ICD-9-CM, ATC, AIC — to the Standardized Vocabularies is still patchy, performed in specific contexts (disease registries, pharmacovigilance observational studies). The main constraint remains access to SNOMED CT: without Italian membership of SNOMED International, the Condition component of OMOP can only be partially exploited.
The growing demand for real-world evidence from EMA, AIFA and clinical research networks is however pushing the theme. The launch of dedicated European programmes — the most relevant, in definition phase, is an IMI (Innovative Medicines Initiative) programme for building a European network of clinical databases mapped to OMOP — is expected in the coming months.
References: OHDSI — OMOP Common Data Model v5.x. OMOP Standardized Vocabularies, distributed via Athena (athena.ohdsi.org). Usagi — OHDSI Code Mapping Tool. SNOMED International, Regenstrief Institute (LOINC), US National Library of Medicine (RxNorm).