12.5.4 : Simulation
Wednesday, Mar 19 10:00 PM - 10:40 PM CET : GPU-Accelerated Mixed-Precision Spatial Statistics for Efficient Climate/Weather Modeling and Emulation [S72300]- Hatem Ltaief : Principal Research Scientist, KAUST
- Qinglei Cao : Assistant Professor, Saint Louis University
- Slides
- Geostatistical and spactal simulation (not less than 100 km, and daily only with asumtion of axial symetry)
- Matern Covariance Function
- Statistical Emulator (approximate complex climate simulation)
- Cholesky Factorization, A=L L^t (most of the computing time)
- Compute determinent of the matrix (dense matrix but with data sparsity structure)
- They can use low rank approximation
- On Summit, ALPS and LEONARDO
- Tile scheduling of the Cholesky Factorization
- Portability (Intel, ARM, NVidia, AMD, CPU, GPU) (because they rely on vendors BLAS libraries)
- New steps since the presentation of 2022
- Do not oversolve or overstore
- cuTile and cuDTX presented by Stephen Jones will be integrated
- Tasks based runtime (they are based on PaRSEC which is 15 years old)
- Many matrices arising in applications have blocks of relatively small norm and can be replaced with reduced precision
- As always data movement is really expensive
- Speed up of 3.4 on LEONARDO (A100)
- Speed up of 5.2 on ALPS (GH200) 0.739 EFlop/s
- No AI comparison, problem with overfitting, expensive to retrain
- But, they want to apply mixed precision algorithms to AI models
- Using FP64 Emulation is possible to integrate the remaining FP64 blocks