AIHealth
On-premise clinical platform with local LLMs, RAG on FHIR/DICOM data, diagnostic support, remote follow-up. Architecture designed for the MDR pathway.
Discover AIHealth →
Artificial Intelligence
EU AI Act consulting: system classification, policy definition, AI governance, training.
Discover →Context: why Metal
Until 2014, Apple platforms used OpenGL/OpenGL ES for rendering and OpenCL for GPU compute — APIs conceived in the 1990s, with high-overhead driver models, implicit state management and serial submission paths. On contemporary mobile GPUs, the CPU cost of each draw call was significant relative to available resources.
Apple announced Metal at WWDC 2014 (2 June 2014) as a low-overhead graphics API for iOS 8 on A7/A8 devices. The stated technical goals: cut CPU submission cost, make state and resource management explicit, and move closer to the models later published as Vulkan and Direct3D 12 (2015–2016).
Metal architecture
Metal exposes a programming model built around explicit objects:
- MTLDevice — the GPU
- MTLCommandQueue / MTLCommandBuffer — command queue and batch
- MTLRenderCommandEncoder / MTLComputeCommandEncoder / MTLBlitCommandEncoder — encoders for the three workload types
- MTLRenderPipelineState / MTLComputePipelineState — pre-compiled, immutable pipelines
- MTLBuffer / MTLTexture / MTLHeap — resources with explicit storage modes (shared, managed, private, memoryless)
Submission is explicit: the app builds command buffers, commits them to the queue and manages synchronisation via fences, events and shared events. No driver guessing at state.
Metal Shading Language (MSL)
MSL is the shader language, derived from C++14 with GPU extensions. It is the single language for vertex, fragment, tile, compute and mesh shaders (introduced in Metal 3). The compiler is the Metal Compiler (metal), based on LLVM, which emits AIR (Apple Intermediate Representation) intermediate binaries packed into metallib. At runtime, the driver produces machine code for the target GPU.
kernel void saxpy(
device const float* x [[buffer(0)]],
device float* y [[buffer(1)]],
constant float& a [[buffer(2)]],
uint i [[thread_position_in_grid]])
{
y[i] = a * x[i] + y[i];
}
Metal 2 (2017) and Metal 3 (2022)
- Metal 2 — WWDC 2017 (macOS High Sierra, iOS 11). Adds Argument Buffers (binding large resource sets in a single buffer), Raster Order Groups, external GPU (eGPU) support and integration with the Metal Performance Shaders framework.
- Metal 3 — WWDC 6 June 2022 (macOS Ventura, iOS 16). Introduces MetalFX Upscaling (spatial and temporal, analogue to DLSS/FSR), mesh shaders, Fast Resource Loading (asynchronous loading directly from the filesystem into textures), Offline Shader Compilation to cut first-use stutter.
Metal Performance Shaders and MPSGraph
Metal Performance Shaders (MPS) is a framework of compute kernels tuned for each Apple GPU family: convolutions, matrices, FFT, image filters, neural networks. On top of it, MPSGraph is a graph-based compute API aimed at machine learning — a high-level equivalent of XLA or of cuDNN + TensorRT.
Apple’s ML stack sits on MPS/MPSGraph:
- Core ML uses MPS as the GPU execution backend
- PyTorch has exposed the
mpsdevice since 2022 (Apple + Meta collaboration), mapping operations onto MPSGraph - TensorFlow runs via the tensorflow-metal backend on MPS
- JAX on Apple Silicon via IREE/MPS (experimental)
Compute and ML on Apple Silicon
With the move to Apple Silicon (M1 in November 2020, then M2, M3, M4, plus Pro/Max/Ultra variants), memory is unified: CPU, GPU and Neural Engine share the same DRAM pool without explicit copies. In Metal, resources with shared storage mode reside in unified memory; private enables internal-VRAM copies. The dedicated Neural Engine (separate NPU) drives a split in ML workloads: generic kernels on the GPU via MPS, supported kernels on the Neural Engine via Core ML.
For reduced-precision LLMs (INT4/INT8, GGUF), llama.cpp has had a Metal backend since 2023: on an M2/M3 Max with 64-128 GB of unified memory, local inference of 30-70B quantised models is practical — a relevant use case for AI labs and workstations without a discrete GPU.
Comparison with Vulkan and Direct3D 12
Metal shares with Vulkan and Direct3D 12 the low-overhead and explicit state philosophy. Practical differences:
- Metal is platform-locked (Apple only). Vulkan is cross-vendor on Linux, Android, Windows; D3D12 is Windows/Xbox.
- MSL is closer to standard C++ than HLSL and GLSL/SPIR-V — though SPIR-V is an interchangeable binary IR.
- Metal synchronisation is generally considered more ergonomic than Vulkan’s (more driver-hidden from the developer).
Projects such as MoltenVK (KhronosGroup) and DXVK / D3DMetal translate Vulkan and D3D12 to Metal in userspace, enabling Windows/Linux games on macOS.
The noze context
noze evaluates Apple Silicon as a local inference platform for mid-sized models and as a development environment: unified memory makes M2/M3 Max/Ultra workstations suitable for prototyping LLM pipelines without a discrete GPU. For healthcare production workloads, however, the reference stack remains CUDA/NVIDIA: Apple offers no server-class hardware, porting custom CUDA kernels to Metal requires rewriting, and no Apple GPUs exist in data-centre configurations with InfiniBand or NVLink-equivalent fabrics.
References: Metal announced at WWDC on 2 June 2014 (iOS 8). Metal 2 announced at WWDC on 5 June 2017 (macOS High Sierra, iOS 11). Metal 3 announced at WWDC on 6 June 2022 (macOS Ventura, iOS 16). Apple M1 announced 10 November 2020. PyTorch mps backend since PyTorch 1.12 (June 2022). Sources: Apple Developer Documentation (Metal, Metal Shading Language, Metal Performance Shaders), WWDC keynotes.