NVIDIA GB10: local AI arrives on the desktop with 128 GB unified memory

The NVIDIA GB10 Grace Blackwell Superchip brings compact desktop workstations (NVIDIA DGX Spark, Lenovo ThinkStation PGX) with 128 GB of unified memory and 1 petaFLOP FP4 for on-premise fine-tuning and inference of models up to 200 billion parameters. noze adopts them as a reference platform for its on-premise AI products.

AIR&DnozeCyber SecurityDigital HealthHardware GB10NVIDIAGrace BlackwellLenovo ThinkStation PGXDGX SparkAI WorkstationOn-PremiseEdge AIDigital Sovereignty

The GB10 Grace Blackwell chip

NVIDIA GB10 Grace Blackwell Superchip combines in a single package a 20-core Arm CPU (Grace) and a Blackwell GPU with 128 GB of unified coherent LPDDR5x memory, shared between CPU and GPU via NVLink C2C. Nominal performance is 1 petaFLOP in FP4, enabling inference and fine-tuning of models up to 200 billion parameters with appropriate quantisation.

It is the hardware NVIDIA designed to bring a data-center class AI stack into a desktop form factor: the same software (CUDA, cuDNN, TensorRT, NeMo, NIM, Triton) as large DGX systems, but in a compact chassis, with power and footprint compatible with a normal office.

An architectural pattern first introduced by Apple Silicon

The pattern — ARM CPU + GPU + coherent unified DRAM on a single module, with no hierarchical split between system memory and VRAM — was first brought to the consumer/workstation segment by Apple Silicon with the M1 chip (10 November 2020) and continued with M2, M3, M4 and the Pro/Max/Ultra variants. On M2/M3 Max/Ultra, configurations with up to 128–192 GB of unified memory have made Apple workstations practical platforms for local inference of mid-sized LLMs (30–70B quantised parameters) via the Metal backend of llama.cpp and the PyTorch mps device.

GB10 reuses the same architectural scheme with an explicitly AI-first slant: CUDA stack (not Metal), native libraries for training and inference, high FP4/FP8 density, binary compatibility with higher-class DGX systems. Where the M-series is a general-purpose workstation with unified memory, GB10 is a miniaturised DGX — same coherent-memory idea, different silicon optimisation.

The workstations available today

Two GB10-based machines are already on the market:

  • NVIDIA DGX Spark — NVIDIA’s official “personal AI supercomputer”, pre-installed with NVIDIA DGX OS and the full AI stack
  • Lenovo ThinkStation PGX — compact workstation (Mac Mini-sized form factor), 128 GB unified memory, up to 4 TB NVMe storage, NVIDIA DGX OS

Other OEMs (Asus, Dell, HP, Supermicro) have announced variants based on the same Superchip. List price is around €/$3,000-4,000, substantially lower than server-class Blackwell workstations.

Why it matters for local AI

GB10 fills an important operational niche for those who want AI and LLMs on-premise:

  • Privacy and data sovereignty — local inference, zero data to the cloud. Critical for healthcare, finance, public administration, defence, professional firms
  • Cost/benefit ratio — a €3-4k machine runs 70B-200B parameter models, versus many thousands of euros per month for equivalent cloud GPUs
  • Latency — local LLMs with sub-50 ms latency with no network, enabling real-time conversational assistants, RAG and agentic flows
  • Compactness — about 1 litre of volume and ~150 W consumption: fits under a desk, no racks or HVAC needed
  • Full NVIDIA stack — CUDA, NeMo, NIM, Triton Inference Server, TensorRT-LLM — the same tools as DGX, scalable to data-center workloads when needed
  • Open source stack compatibility — Ollama, vLLM, llama.cpp, MLC-LLM, SGLang, text-generation-inference, plus open models (Llama, Mistral, Qwen, DeepSeek, Gemma, BioMistral, Granite)

noze’s support

noze adopts GB10 as the reference hardware platform for its on-premise products integrating AI, and provides direct support to clients:

  • Admina — AI governance framework: on GB10 it runs LLM-as-judge models locally for output evaluation and prompt classification, without sending data externally
  • AIHealth — AI-assisted medical diagnostics: on-premise fine-tuning and inference of open source medical models (BioMistral, Meditron) with data that never leaves the hospital
  • CyberScan — LLM-augmented security analytics: alert triage and event correlation via local LLM
  • Custom R&D solutions — prototypes, pilots and PoCs for clients evaluating on-premise AI before investing in larger infrastructure

noze’s support includes configuration, hardening (DGX OS, network policies, backup, telemetry), integration with the open source stack, model sizing against client workloads, and ongoing operational assistance.

In the Italian context

In a market that demands EU AI Act compliant AI and data within the EU, the GB10 represents a concrete option for:

  • Innovative SMEs that want custom LLMs or internal agents without relying on extra-EU clouds
  • Hospitals and clinics processing sensitive health data (GDPR art. 9)
  • Law, notary and accounting firms with confidential documentation
  • Public bodies and local administrations with sovereignty requirements
  • University labs and R&D centres with limited budgets for prototyping

Those starting with a single GB10 workstation can later scale to multi-node clusters (two DGX Sparks linked via ConnectX-7 handle 405B parameter models) or integrate with DGX and GB200/GB300 NVL72 infrastructure as workloads grow.


References: NVIDIA GB10 Grace Blackwell Superchip (announced at CES 2025). NVIDIA DGX Spark. Lenovo ThinkStation PGX (Lenovo announcement 2025, availability Q4 2025 / Q1 2026). 128 GB LPDDR5x unified memory. 1 petaFLOP FP4. Inference/fine-tuning of models up to 200B parameters. NVIDIA DGX OS, CUDA, NeMo, NIM. Open source stack compatibility: Ollama, vLLM, llama.cpp, MLC-LLM, SGLang.

Need support? Under attack? Service Status
Need support? Under attack? Service Status