Simulation

12.6.4 : Simulation

Tuesday, Mar 17 1:00 AM - 1:40 AM CET : Cutting-Edge Molecular Dynamics on the Latest Multi-Node NVLink Technology [S81542]

Alan Gray, Principal Developer Technology Engineer, NVIDIA
Mahesh Doijade, Sr. Compute Developer Technology Engineer, NVIDIA

Tuesday, Mar 17 5:00 PM - 5:40 PM CET : The State of the Art of Quantum Chemistry on GPUs: EXESS, Exascale, and Floating-Point Emulation on Blackwell [S81503]

Giuseppe M. J. Barca, Professor at MIPS and ANU, Co-Founder and Head of Research at QDX, Monash Institute of Pharmaceutical Sciences, Australian National University, and QDX Technologies

LIVE : Tuesday, Mar 17 11:00 PM - 11:40 PM CET : Accelerate Geospatial Workflows for Planetary Insight [S81732]

May Casterline, Director, Solutions Architecture, NVIDIA
Kiruthika Devaraj, VP of Spacecraft, Planet
Looking at history => 2M years in the past watching Andromeda galaxy
Getting compute at close as the sensor
800PB of data about earth since 80 years
200 satelittes around the Earth : soon 1m resolution
Ask to data directly
Down raw data as fast as possible
Entrirely on the GPU
Blob of data => 2s on a single GPU
Comparing GPU to 1 CPU thread (=> lazy but they don't care about CPU)
Next step : atmospheric compensation
Jetsen can fly on Pelican satelittes
Thermal management in space ? Orin : not that compute intensive. You have to dissipate hit as quick as possible and then radiate it
Downlink latency => Edge compute on the satelittes => 50 MB on image compressed to 1MB with model nad the communication link will be faster and faster
How many satelittes you need ? Pelican 30 min revisit (~22 satelittes)
Band registrations ? Most satelittes have TDI sensor, bands are sligtly offset while the satellite moves
Use raw data direclty into a model to get rough description quickly. They could in theory train a model on raw data
Pelican constellation have a 5 year life time

Tuesday, Mar 17 11:00 PM - 11:40 PM CET : The Earth System at 1 km Resolution: Breaking Frontiers in Climate Science [S82185]

Daniel Klocke Group Leader, Max Planck Institute for Meteorology (MPI-M)
The resolution has a big impact on the topology and accuracy of the results
Mont Blan: 4810m => 4018m at 1km, 1394m in traditional simulation
At km scale we see high cloud structures and rain shaft
At km scale the computing is simplified (less bugs, less assumptions, simpler models)
Getting information at scale (global or local)
Incorporate cycles of water, energy and carbon
1km simulation is possible with exa scale computer
220m NICAM for atmospheric simulation only
For now best multip hysic simulation have 200km resolution
1 million lines of code of Fortran and OpenACC to take account all interactions
They used Jupiter : 24000 GH200, 1 EFlops 4 top 500
ALPS : 11000 GB200, 0.435 EFlops, 8 Top500
Since the atmosphere moves faster, they need smaller time steps
A lot of small kernel => use of Cuda Graph
atmosphere simulation on Hopper GPU,
Ocean simulation on Grace CPU
Pragmas in Fortran code represent about 50 percent of the whole code base
They will remove Pragmas to gain portability
About 10^12 degrees of freedom
145.7 simulated days per day on Jupiter
Next => 150m resolution => what are clouds doing in a warmer climate
They did some system tuning to have a nice scaling on both Jupiter and Alps even if their network are different
They plan to deal with different king of precipitation depending on the clouds, but for now their micro physic simulation is not complex enough
They will try seebottle (model from NVidia to simulate climate)

Wednesday, Mar 18 12:00 AM - 12:40 AM CET : 2025 Gordon Bell Winner: Forecasting Tsunamis in Real Time With Digital Twins [S82161]

Stefan Henneking, Research Associate, The University of Texas at Austin
Omar Ghattas, Professor and Cockrell Endowed Chair in Engineering, The University of Texas at Austin

LIVE : Thursday, Mar 19 5:00 PM - 5:40 PM CET : Magic Attention: A Composable Framework for Exploring Warp-Specialized Attention on Blackwell [S82294]

Manish Gupta, Member of Technical Staff, Magic AI, Inc
SLIDES OK : GTC2026/S82294_1773959552971001Ivf7.pdf
Flash attention algorithms
Composable component, instruction interleaving
Handle low level specific hardware variation
Statix and dynamic scheduler
Magic attention on FA4 with 2 queues schedule, 2-3 weeks with 2 queue schedule
THe peak performance might be not at the max package
TMEM is limited
Precompute QK to have a softmax ready
Let's work spacialised everything
95% of the CTA performances (Cooperative Thread Array) == Thread Block

LIVE : Thursday, Mar 19 7:00 PM - 8:30 PM CET : An Introduction to the Newton Physics Engine for Robotics [S81613]

NO access