12.5.1 : Keynote
Tuesday, Mar 18 6:00 PM - 8:00 PM CET : GTC 2025 Keynote [S72484]
- Jensen Huang : NVIDIA
- Slides
- RTX Blackwell 5090 is 30
- 1 pixel -> 15 generated and has to stay temporary stable
- 5090 30 percent smaller than 4090
- Agentic AI -> Reasoning
- The only way to have more people at GTC is to grow San Jose
- chain of thought, Best of end, consitency checking
- The amount of computation needed for inference is 100x compare to what we expect
- numpy -> cupy
- cuLitho : ASML, TSMC
- cuOPT : mathematical optimisation (for flight, worker, drivers, plants, writers, etc)
- cuOPT : will be open source soon
- Parabrick : gene sequencing
- EARTH 2 : weather simulation
- cuQuantum : quantum computing simulation
- cuEquivariance, and cuTensor
- cuDSS, AMGX, cuFFT, cuSparse
- cuDF, cuML (acceleration for Pandas and Spark)
- Warp for physic
- GM choose NVidia to build their autonomous vehicles fleat
- Automotive safety : NVIDIA Halos, 7 millions lines of code safety assist
- Two Blackwell GPU into on Blackwell package : HGX
- Swiches in the middle of the Rack
- Liquid cooled => compressed all the Computing into one single rack
- 60000 components to 600000 components => 1 EFlops computer in one Rack
- One Rack has as many part as 20 cars
- Inference at scale : x axis : token per second, y axis : Throughput in token
- 400 tokens for a classic model, 8000 for a Reasoning model. 20x more token, 150x more computing
- Pipeline Parallelism, tensor Parallelism, expert Parallelism
- Prefill : read pdf or web site, watch a video to get information and learn
- You ingest the KV cache and you produce one token, and redo the same for the next token
- NVIDIA DYNAMO : operating system of an AI factory : find which data and computation to do on which GPU on which rack
- NVIDIA DYNAMO : open source -with perplexity as partner
- Token per second per megawatt
- NVLink 8 and FP8, and than FP4 for quantization
- 25x in one generation at the same consumption power
- In this context Blackwell is 40x the potential of Hopper (for reasoning model)
- 25x at iso power in one generation (between Hopper and Blackwell)
- Full Production of Blackwell
- Blackwell ultra : upgrade second half of this year
- Vera Rubin : Vera 2x Grace (the chassi if the same), NVLink144 => connected to 144 GPUs
- Second half of the year, Rubin Ultra
- Spectrum-X : SuperNIC and supercharge Ethernet
- Copper for local communitation and photonic for large scale stadium data center communitation
- Mark Zander techonlogy (Transivers + Laser)
- Transivers from the GPU, to the switch, to the next switch
- Standard 30 Watts and 1000$ if you buy it in high volume, 6 per GPU => 180 W/GPU, + 6000$ per GPU => 30 W per transivers
- Every GPU would have 6 transivers slots : 100 watt just for communitation
- 1M GPU => 6M Transivers of 30 Watts => 180 MW just for Transivers
- 6 MW is 10 Rubin Ultra Racks
- World First MIR Micro Mirror and micro Mirror (TSMC : cupe)
- 3.5x less consumption : from 30 W to 8.57 W per Transiver
- 100% of NVidia developer will be AI assisted by the end of this year
- Grace Blackwell personnal computer : 20 PFlops, liquid cooled
- For the very first time your storage system will be GPU accelerated
- DGX SParks (Gigits) : 20 CPU, 128 GB memory, 1PFlop
- DGX Station : Grace Blackwell, workstation : 72 CPU Cores, 20 PFlops
- Storage accelerated GPU
- NVIDIA Omniverse with Cosmos
- Verifiable physics rewards
- NVIDIA + Disney Research + Google DeepMind
- Groot N1 is open-source