On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processor-based HPC systems
Computer Physics Communications 269 : (2021) // Article ID 108105
Abstract
No area of computing is hungrier for performance than High Performance Computing (HPC), the demands of which continue to be a major driver for processor performance and adoption of accelerators, and also advances in memory, storage, and networking technologies. A key feature of the Intel processor domination of the past decade has been the extensive adoption of GPUs as coprocessors, whilst more recent developments have seen the increased availability of a number of CPU processors, including the novel ARM-based chips. This paper analyses the performance and scalability of a state-of-the-art Computational Fluid Dynamics (CFD) code on two HPC cluster systems: Hawk, equipped with AMD EPYC-Rome (EPYC, 4096 cores) and Intel Skylake (SKL, 8000 cores) processors and Infiniband EDR interconnect; and Isambard, equipped with ARM-based Marvell ThunderX2 (TX2, 8192 cores) and a Cray Aries interconnect. The code Hydro3D was analysed in three benchmark cases with increasing level of numerical complexity, namely lid-driven cavity flow using 4th-order central-differences, Taylor-Green vortex solved with a
5th-order WENO scheme, and a travelling solitary wave computed using the level-set method and WENO; in problem sizes designed with larger computation-to-communication ratio on a single or multiple nodes. Our results show that the EPYC cluster delivers the best code performance for all the setups under consideration. In the first two benchmarks, the SKL cluster demonstrates faster computing times than the TX2 system, whilst in the solitary wave simulations, the TX2 cluster achieves good scalability and similar performance to the EPYC system, both improving on that obtained with the SKL cluster. These results suggest that while the Intel SKL cores deliver the best strong scalability, the associated cluster performance is lower compared to the EPYC system. The TX2 cluster performance is promising considering its recent addition to the HPC portfolio.