LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768
input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training
1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size:
32,768
A database join and aggregation workload with Snappy / Deflate compression derived from
TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from
GB200 NLV72 vs. Intel Xeon 8480+
Projected performance subject to change.
Real-Time LLM Inference
GB200 NVL72 introduces cutting-edge capabilities and a second-generation Transformer
Engine which enables FP4 AI and when coupled with fifth-generation NVIDIA NVLink,
delivers 30X faster real-time LLM inference performance for trillion-parameter language
models. This advancement is made possible with a new generation of Tensor Cores, which
introduce new microscaling formats, giving high accuracy and greater throughput.
Additionally, the GB200 NVL72 uses NVLink and liquid cooling to create a single massive
72-GPU rack that can overcome communication bottlenecks.
Massive-Scale Training
GB200 NVL72 includes a faster second-generation Transformer Engine featuring FP8
precision, enabling a remarkable 4X faster training for large language models at scale.
This breakthrough is complemented by the fifth-generation NVLink, which provides 1.8
terabytes per second (TB/s) of GPU-to-GPU interconnect, InfiniBand networking, and
NVIDIA Magnum IO™ software.
Energy-Efficient Infrastructure
Liquid-cooled GB200 NVL72 racks reduce a data center’s carbon footprint and energy
consumption. Liquid cooling increases compute density, reduces the amount of floor space
used, and facilitates high-bandwidth, low-latency GPU communication with large NVLink
domain architectures. Compared to NVIDIA H100 air-cooled infrastructure, GB200 delivers
25X more performance at the same power while reducing water consumption.
Data Processing
Databases play critical roles in handling, processing, and analyzing large volumes of
data for enterprises. GB200 takes advantage of the high-bandwidth memory performance,
NVLink-C2C, and dedicated decompression engines in the NVIDIA Blackwell architecture to
speed up key database queries by 18X compared to CPU and deliver a 5X better TCO.