12.5.1 : Keynote



Tuesday, Mar 18 6:00 PM - 8:00 PM CET : GTC 2025 Keynote [S72484]
  • Jensen Huang : NVIDIA
  • Slides
  • RTX Blackwell 5090 is 30
  • 1 pixel -> 15 generated and has to stay temporary stable
  • 5090 30 percent smaller than 4090
  • Agentic AI -> Reasoning
  • The only way to have more people at GTC is to grow San Jose
  • chain of thought, Best of end, consitency checking
  • The amount of computation needed for inference is 100x compare to what we expect
  • numpy -> cupy
  • cuLitho : ASML, TSMC
  • cuOPT : mathematical optimisation (for flight, worker, drivers, plants, writers, etc)
  • cuOPT : will be open source soon
  • Parabrick : gene sequencing
  • EARTH 2 : weather simulation
  • cuQuantum : quantum computing simulation
  • cuEquivariance, and cuTensor
  • cuDSS, AMGX, cuFFT, cuSparse
  • cuDF, cuML (acceleration for Pandas and Spark)
  • Warp for physic
  • GM choose NVidia to build their autonomous vehicles fleat
  • Automotive safety : NVIDIA Halos, 7 millions lines of code safety assist
  • Two Blackwell GPU into on Blackwell package : HGX
  • Swiches in the middle of the Rack
  • Liquid cooled => compressed all the Computing into one single rack
  • 60000 components to 600000 components => 1 EFlops computer in one Rack
  • One Rack has as many part as 20 cars
  • Inference at scale : x axis : token per second, y axis : Throughput in token
  • 400 tokens for a classic model, 8000 for a Reasoning model. 20x more token, 150x more computing
  • Pipeline Parallelism, tensor Parallelism, expert Parallelism
  • Prefill : read pdf or web site, watch a video to get information and learn
  • You ingest the KV cache and you produce one token, and redo the same for the next token
  • NVIDIA DYNAMO : operating system of an AI factory : find which data and computation to do on which GPU on which rack
  • NVIDIA DYNAMO : open source -with perplexity as partner
  • Token per second per megawatt
  • NVLink 8 and FP8, and than FP4 for quantization
  • 25x in one generation at the same consumption power
  • In this context Blackwell is 40x the potential of Hopper (for reasoning model)
  • 25x at iso power in one generation (between Hopper and Blackwell)
  • Full Production of Blackwell
  • Blackwell ultra : upgrade second half of this year
  • Vera Rubin : Vera 2x Grace (the chassi if the same), NVLink144 => connected to 144 GPUs
  • Second half of the year, Rubin Ultra
  • Spectrum-X : SuperNIC and supercharge Ethernet
  • Copper for local communitation and photonic for large scale stadium data center communitation
  • Mark Zander techonlogy (Transivers + Laser)
  • Transivers from the GPU, to the switch, to the next switch
  • Standard 30 Watts and 1000$ if you buy it in high volume, 6 per GPU => 180 W/GPU, + 6000$ per GPU => 30 W per transivers
  • Every GPU would have 6 transivers slots : 100 watt just for communitation
  • 1M GPU => 6M Transivers of 30 Watts => 180 MW just for Transivers
  • 6 MW is 10 Rubin Ultra Racks
  • World First MIR Micro Mirror and micro Mirror (TSMC : cupe)
  • 3.5x less consumption : from 30 W to 8.57 W per Transiver
  • 100% of NVidia developer will be AI assisted by the end of this year
  • Grace Blackwell personnal computer : 20 PFlops, liquid cooled
  • For the very first time your storage system will be GPU accelerated
  • DGX SParks (Gigits) : 20 CPU, 128 GB memory, 1PFlop
  • DGX Station : Grace Blackwell, workstation : 72 CPU Cores, 20 PFlops
  • Storage accelerated GPU
  • NVIDIA Omniverse with Cosmos
  • Verifiable physics rewards
  • NVIDIA + Disney Research + Google DeepMind
  • Groot N1 is open-source