12.6.4 : Simulation

Tuesday, Mar 17 1:00 AM - 1:40 AM CET : Cutting-Edge Molecular Dynamics on the Latest Multi-Node NVLink Technology [S81542]
  • Alan Gray, Principal Developer Technology Engineer, NVIDIA
  • Mahesh Doijade, Sr. Compute Developer Technology Engineer, NVIDIA


Tuesday, Mar 17 5:00 PM - 5:40 PM CET : The State of the Art of Quantum Chemistry on GPUs: EXESS, Exascale, and Floating-Point Emulation on Blackwell [S81503]
  • Giuseppe M. J. Barca, Professor at MIPS and ANU, Co-Founder and Head of Research at QDX, Monash Institute of Pharmaceutical Sciences, Australian National University, and QDX Technologies


LIVE : Tuesday, Mar 17 11:00 PM - 11:40 PM CET : Accelerate Geospatial Workflows for Planetary Insight [S81732]
  • May Casterline, Director, Solutions Architecture, NVIDIA
  • Kiruthika Devaraj, VP of Spacecraft, Planet
  • Looking at history => 2M years in the past watching Andromeda galaxy
  • Getting compute at close as the sensor
  • 800PB of data about earth since 80 years
  • 200 satelittes around the Earth : soon 1m resolution
  • Ask to data directly
  • Down raw data as fast as possible
  • Entrirely on the GPU
  • Blob of data => 2s on a single GPU
  • Comparing GPU to 1 CPU thread (=> lazy but they don't care about CPU)
  • Next step : atmospheric compensation
  • Jetsen can fly on Pelican satelittes
  • Thermal management in space ? Orin : not that compute intensive. You have to dissipate hit as quick as possible and then radiate it
  • Downlink latency => Edge compute on the satelittes => 50 MB on image compressed to 1MB with model nad the communication link will be faster and faster
  • How many satelittes you need ? Pelican 30 min revisit (~22 satelittes)
  • Band registrations ? Most satelittes have TDI sensor, bands are sligtly offset while the satellite moves
  • Use raw data direclty into a model to get rough description quickly. They could in theory train a model on raw data
  • Pelican constellation have a 5 year life time


Tuesday, Mar 17 11:00 PM - 11:40 PM CET : The Earth System at 1 km Resolution: Breaking Frontiers in Climate Science [S82185]
  • Daniel Klocke Group Leader, Max Planck Institute for Meteorology (MPI-M)
  • The resolution has a big impact on the topology and accuracy of the results
  • Mont Blan: 4810m => 4018m at 1km, 1394m in traditional simulation
  • At km scale we see high cloud structures and rain shaft
  • At km scale the computing is simplified (less bugs, less assumptions, simpler models)
  • Getting information at scale (global or local)
  • Incorporate cycles of water, energy and carbon
  • 1km simulation is possible with exa scale computer
  • 220m NICAM for atmospheric simulation only
  • For now best multip hysic simulation have 200km resolution
  • 1 million lines of code of Fortran and OpenACC to take account all interactions
  • They used Jupiter : 24000 GH200, 1 EFlops 4 top 500
  • ALPS : 11000 GB200, 0.435 EFlops, 8 Top500
  • Since the atmosphere moves faster, they need smaller time steps
  • A lot of small kernel => use of Cuda Graph
  • atmosphere simulation on Hopper GPU,
  • Ocean simulation on Grace CPU
  • Pragmas in Fortran code represent about 50 percent of the whole code base
  • They will remove Pragmas to gain portability
  • About 10^12 degrees of freedom
  • 145.7 simulated days per day on Jupiter
  • Next => 150m resolution => what are clouds doing in a warmer climate
  • They did some system tuning to have a nice scaling on both Jupiter and Alps even if their network are different
  • They plan to deal with different king of precipitation depending on the clouds, but for now their micro physic simulation is not complex enough
  • They will try seebottle (model from NVidia to simulate climate)
Wednesday, Mar 18 12:00 AM - 12:40 AM CET : 2025 Gordon Bell Winner: Forecasting Tsunamis in Real Time With Digital Twins [S82161]
  • Stefan Henneking, Research Associate, The University of Texas at Austin
  • Omar Ghattas, Professor and Cockrell Endowed Chair in Engineering, The University of Texas at Austin


LIVE : Thursday, Mar 19 5:00 PM - 5:40 PM CET : Magic Attention: A Composable Framework for Exploring Warp-Specialized Attention on Blackwell [S82294]
  • Manish Gupta, Member of Technical Staff, Magic AI, Inc
  • SLIDES OK : GTC2026/S82294_1773959552971001Ivf7.pdf
  • Flash attention algorithms
  • Composable component, instruction interleaving
  • Handle low level specific hardware variation
  • Statix and dynamic scheduler
  • Magic attention on FA4 with 2 queues schedule, 2-3 weeks with 2 queue schedule
  • THe peak performance might be not at the max package
  • TMEM is limited
  • Precompute QK to have a softmax ready
  • Let's work spacialised everything
  • 95% of the CTA performances (Cooperative Thread Array) == Thread Block


LIVE : Thursday, Mar 19 7:00 PM - 8:30 PM CET : An Introduction to the Newton Physics Engine for Robotics [S81613]
  • NO access