12.5.4 : Simulation

Wednesday, Mar 19 10:00 PM - 10:40 PM CET : GPU-Accelerated Mixed-Precision Spatial Statistics for Efficient Climate/Weather Modeling and Emulation [S72300]
  • Hatem Ltaief : Principal Research Scientist, KAUST
  • Qinglei Cao : Assistant Professor, Saint Louis University
  • Slides
  • Geostatistical and spactal simulation (not less than 100 km, and daily only with asumtion of axial symetry)
  • Matern Covariance Function
  • Statistical Emulator (approximate complex climate simulation)
  • Cholesky Factorization, A=L L^t (most of the computing time)
  • Compute determinent of the matrix (dense matrix but with data sparsity structure)
  • They can use low rank approximation
  • On Summit, ALPS and LEONARDO
  • Tile scheduling of the Cholesky Factorization
  • Portability (Intel, ARM, NVidia, AMD, CPU, GPU) (because they rely on vendors BLAS libraries)
  • New steps since the presentation of 2022
  • Do not oversolve or overstore
  • cuTile and cuDTX presented by Stephen Jones will be integrated
  • Tasks based runtime (they are based on PaRSEC which is 15 years old)
  • Many matrices arising in applications have blocks of relatively small norm and can be replaced with reduced precision
  • As always data movement is really expensive
  • Speed up of 3.4 on LEONARDO (A100)
  • Speed up of 5.2 on ALPS (GH200) 0.739 EFlop/s
  • No AI comparison, problem with overfitting, expensive to retrain
  • But, they want to apply mixed precision algorithms to AI models
  • Using FP64 Emulation is possible to integrate the remaining FP64 blocks