Hardware

12.6.2 : Hardware

Tuesday, Mar 17 11:00 PM - 11:40 PM CET : From Grace to Vera: NVIDIA’s next data center CPU [S81680]

Matthias Langer, Sr. AI and Developer Technology Engineer, NVIDIA
Lukas Alt, DevTech Engineer, NVIDIA
Vera : Next CPU for AI and HPC
Available second half of 2026
Some customers already used Vera
Vera liquid cooled Rack => 22528 CPU Cores in a single Rack => 256 Vera CPU
Arm v9.2
6x128b SVE2 vectorization
88 Olympus Cores, 176 Threads with spacial multithreading
2 MB L2 Cache per core, 162 MB L3 Cache
2nd NVidia SCF, 3.4 TB/s (raw Hardware bandwidth), Scalable Coherency Fabric
RAM : LPDDR5x SOCAMM
Raw memory Bandwith up to 1.2 TB/s
Memory Capacity up to 1.5 TB
NV C2C => 2nd generation NVlink-C2C 1.8 TB/s
PCIexpress : 88 Lanes Gen6 with x16, x8, x4, x2 bifurcation
Expect uniform latency for NUMA on cores
Olympus can decode 10 instruction per cycle (6 per cycle in Grace)
Spatial Multi Threading (SMT)
Single thread uses 6x128b, 2 threads use each 3x128b
FP8 is supported on Vera, accumulate into fp16 or fp32
Memory system ressource Partitionning And Monitoring (MPAM)
Recommended GCC 15.2 and LLVM >= 22, -mcpu=olympus or -mcpu=grace on Grace
OK with NVTX
Linux Perf also available, 6 PMU counter can be used at the same time
2.02x compare to Grace on AV1 encoding benchmark
Vera is Arm compatible so, compile on Arm should be enough to make your application run, however it is better to compile and optimise for Vera
Vera should be available with several forms as for Grace
The clock speed are on the same order of magnitude of Grace (nt the same but close)
Basically is you want to run on both Grace and Vera, compile for Grace and it will be quite OK on Vera but is you need very optmised binary you can enable two optimisations binary path which are chosen at run time
Up to 1050 W for a ship

LIVE : Wednesday, Mar 18 11:00 PM - 11:40 PM CET : The New Compute Paradigm: Adapting Infrastructure for the Inference Era [S82430]

Sachin Katti, Head of Industrial Compute, OpenAI
The most depresing talk ever.
Many of these are thoughs
Innovators : what shoud be the way to do something
From stateless to statefull utilisation when models have the full context of your lives...
3x more compute consuption each year
more 30GW in 2030
The goal is to make the human be the bottleneck, for now this is not the case
This will ensure the human does not change context by doing otherthings during the agent is working of the begining of the task
No guaranty that agentic will scale with the number of GPUs or CPUs : 40 percents utilisation cannot be an acceptable utilisation for a GPU

Tuesday, Mar 17 11:00 PM - 11:40 PM CET : Liquid Cooling: How to Achieve 100% Heat Capture on 500kW+ Racks (Presented by CoolIT Systems) [S82004]

Andrew Buckrell, Sr. Thermal Mechanical Engineer, CoolIT Systems
Exponetial growth in power of computing ressource => CPU-GPU 80-90 °C, 65°C for peripheral components
20-30 L/s of liquid flux to cool the Rack
2030 => 1MW rack => getting 100% head capture is crutial
Minimalize the thin of TIM (thermal intarface material, from CPU to hitter)
Liquid metal TIM material => based of galium => highly reactive with aluminium (it destroys aluminium), mix with copper degrades hit transfer
Down to 50µm of TIMs
Try to avoid balloning effect which deforms the TIMs with the temperature and reduce the hit transfer
Liquid cooling at high velocity => erosion and corrosion effect on copper
Vapour chamber to cool RAM
Hit colling with wet aluminium is a Challenge
The future is 500w/cm^2 die, 15kW Chips,
Hit transfer becomes better at higher temperatures
PG0 : Propilene Glycol => no corrosion
Water : freezing is a problem