Keynote

12.5.1 : Keynote

Tuesday, Mar 18 6:00 PM - 8:00 PM CET : GTC 2025 Keynote [S72484]

Jensen Huang : NVIDIA
Slides
RTX Blackwell 5090 is 30
1 pixel -> 15 generated and has to stay temporary stable
5090 30 percent smaller than 4090
Agentic AI -> Reasoning
The only way to have more people at GTC is to grow San Jose
chain of thought, Best of end, consitency checking
The amount of computation needed for inference is 100x compare to what we expect
numpy -> cupy
cuLitho : ASML, TSMC
cuOPT : mathematical optimisation (for flight, worker, drivers, plants, writers, etc)
cuOPT : will be open source soon
Parabrick : gene sequencing
EARTH 2 : weather simulation
cuQuantum : quantum computing simulation
cuEquivariance, and cuTensor
cuDSS, AMGX, cuFFT, cuSparse
cuDF, cuML (acceleration for Pandas and Spark)
Warp for physic
GM choose NVidia to build their autonomous vehicles fleat
Automotive safety : NVIDIA Halos, 7 millions lines of code safety assist
Two Blackwell GPU into on Blackwell package : HGX
Swiches in the middle of the Rack
Liquid cooled => compressed all the Computing into one single rack
60000 components to 600000 components => 1 EFlops computer in one Rack
One Rack has as many part as 20 cars
Inference at scale : x axis : token per second, y axis : Throughput in token
400 tokens for a classic model, 8000 for a Reasoning model. 20x more token, 150x more computing
Pipeline Parallelism, tensor Parallelism, expert Parallelism
Prefill : read pdf or web site, watch a video to get information and learn
You ingest the KV cache and you produce one token, and redo the same for the next token
NVIDIA DYNAMO : operating system of an AI factory : find which data and computation to do on which GPU on which rack
NVIDIA DYNAMO : open source -with perplexity as partner
Token per second per megawatt
NVLink 8 and FP8, and than FP4 for quantization
25x in one generation at the same consumption power
In this context Blackwell is 40x the potential of Hopper (for reasoning model)
25x at iso power in one generation (between Hopper and Blackwell)
Full Production of Blackwell
Blackwell ultra : upgrade second half of this year
Vera Rubin : Vera 2x Grace (the chassi if the same), NVLink144 => connected to 144 GPUs
Second half of the year, Rubin Ultra
Spectrum-X : SuperNIC and supercharge Ethernet
Copper for local communitation and photonic for large scale stadium data center communitation
Mark Zander techonlogy (Transivers + Laser)
Transivers from the GPU, to the switch, to the next switch
Standard 30 Watts and 1000$ if you buy it in high volume, 6 per GPU => 180 W/GPU, + 6000$ per GPU => 30 W per transivers
Every GPU would have 6 transivers slots : 100 watt just for communitation
1M GPU => 6M Transivers of 30 Watts => 180 MW just for Transivers
6 MW is 10 Rubin Ultra Racks
World First MIR Micro Mirror and micro Mirror (TSMC : cupe)
3.5x less consumption : from 30 W to 8.57 W per Transiver
100% of NVidia developer will be AI assisted by the end of this year
Grace Blackwell personnal computer : 20 PFlops, liquid cooled
For the very first time your storage system will be GPU accelerated
DGX SParks (Gigits) : 20 CPU, 128 GB memory, 1PFlop
DGX Station : Grace Blackwell, workstation : 72 CPU Cores, 20 PFlops
Storage accelerated GPU
NVIDIA Omniverse with Cosmos
Verifiable physics rewards
NVIDIA + Disney Research + Google DeepMind
Groot N1 is open-source