NVIDIA: evolution of GPU architectures from Tesla to Blackwell

Technical history of NVIDIA GPU architectures: GeForce 256 (1999), Tesla and CUDA (2006), Fermi, Kepler, Maxwell, Pascal, Volta with Tensor Cores (2017), Turing, Ampere (A100), Ada Lovelace, Hopper (H100) and Blackwell (B200, GB200 NVL72) announced at GTC 2024.

R&DAIHardware NVIDIAGPUCUDATeslaVoltaAmpereHopperBlackwellH100B200HardwareAI

Origins: from RIVA to GeForce 256

NVIDIA Corporation was incorporated on 5 April 1993 by Jensen Huang, Chris Malachowsky and Curtis Priem. The first commercial-volume product was RIVA 128 (1997), followed by the TNT family. On 31 August 1999, NVIDIA announced GeForce 256, presented as the first GPU thanks to the on-chip integration of transform & lighting and triangle setup. With GeForce 3 (2001) came the first programmable shaders (vertex and pixel shader) — the technical prerequisite for later general-purpose GPU computing.

Tesla (2006): unified GPU and CUDA

The Tesla architecture (GPU G80, GeForce 8800 GTX, November 2006) unified vertex and pixel shaders into generic streaming processors organised in Streaming Multiprocessors (SM). In parallel, NVIDIA released CUDA 1.0 (June 2007), a C-like programming model for GPU computing. Tesla marks the transition of the GPU from pure graphics accelerator to programmable platform for HPC, numerical simulation and — increasingly — machine learning.

Fermi, Kepler, Maxwell, Pascal

  • Fermi (2010, GF100) — 512 CUDA cores, unified L1/L2 cache, ECC memory, C++ support in CUDA
  • Kepler (2012, GK110) — up to 2880 cores, dynamic parallelism, Hyper-Q
  • Maxwell (2014) — redesigned for perf/watt
  • Pascal (2016, GP100) — HBM2, NVLink 1.0 (160 GB/s), native FP16. Tesla P100 was the first data-center GPU widely adopted for deep learning

Volta, Turing, Ampere

  • Volta (2017, GV100) — introduces Tensor Cores, units dedicated to matrix operations in FP16/FP32 mixed precision. NVLink 2.0. Tesla V100 powered the first DGX-1 V100 systems and the pre-Transformer training clusters.
  • Turing (2018) — adds RT Cores for hardware ray tracing and the first Tensor Cores on the consumer line (RTX 20).
  • Ampere (2020, GA100) — A100 with HBM2e, 3rd-gen Tensor Cores, BF16 and TF32 support, 2:4 sparsity for inference, MIG (Multi-Instance GPU) partitioning an A100 into up to seven isolated instances. GeForce RTX 30 on the consumer side.

Hopper (2022) and the Large Language Model era

The Hopper architecture (GH100, announced at GTC on 22 March 2022) was designed for large-scale model training:

  • H100 with 80 GB HBM3, ~3 TB/s bandwidth
  • Transformer Engine: 4th-gen Tensor Cores with FP8 support and dynamic precision management
  • NVLink 4.0 (900 GB/s per GPU), NVSwitch 3 for intra-node all-to-all interconnect
  • Hardware Confidential Computing

H100 has been the reference training GPU for GPT-4, Llama 2/3, Gemini and Claude throughout the 2023–2024 cycle.

Blackwell (GTC 2024)

At GTC on 18 March 2024, during Jensen Huang’s keynote, NVIDIA unveiled Blackwell:

  • B100 / B200 — two GPU dies connected via NV-HBI (10 TB/s inter-die), presented to software as a single logical GPU; up to 192 GB HBM3e, 8 TB/s
  • 2nd-gen Transformer Engine with FP4
  • GB200 Superchip: Grace CPU (ARM Neoverse) + two B200 on a single board, linked via NVLink-C2C
  • GB200 NVL72: rack-scale system with 36 Grace and 72 B200 connected by the NVLink Switch system, rated at ~1.4 exaFLOPS FP4

The stated target is training and inference of trillion-parameter models.

Ada Lovelace and the consumer line

Parallel to the data-center line, the consumer line continued with Ada Lovelace (2022, RTX 40, 4th-gen Tensor Cores, 3rd-gen RT Cores, DLSS 3 with frame generation). Ada and Hopper share the TSMC 4N process. Ada is commonly used in AI labs for fine-tuning and inference of mid-scale models (up to ~70 B quantised parameters).

The noze context

In R&D and digital health, noze uses NVIDIA GPUs for fine-tuning and inference of on-premise LLMs: RTX professional cards for development, H100 in cluster configurations for training workloads. CUDA compatibility preserves stack portability (PyTorch, llama.cpp, vLLM, TensorRT-LLM) from laptop to server GPU, reducing divergence between development and production environments — a baseline requirement for AIHealth pipelines and MDR / EU AI Act pathways.


References: NVIDIA Corporation, incorporated 5 April 1993 (founders: Jensen Huang, Chris Malachowsky, Curtis Priem). GeForce 256 announced 31 August 1999. Tesla architecture (G80) shipped with GeForce 8800 GTX, 8 November 2006. CUDA 1.0 released June 2007. Hopper GH100 announced at GTC, 22 March 2022. Blackwell unveiled at GTC, 18 March 2024 (Jensen Huang keynote). Primary sources: NVIDIA architecture whitepapers, GTC keynotes, developer.nvidia.com.

Need support? Under attack? Service Status
Need support? Under attack? Service Status