Abstract
With an ever-growing compute advantage over CPUs, GPUs are often used in workloads with ample BLAS computation to improve performance. However, several factors including data-to-compute ratio, amount of data re-use, and data structure shape can all impact performance. Hence, using a GPU is not a guarantee of better BLAS performance. In this work, we introduce the GPU BLAS Offload Benchmark (GPU-BLOB), a novel and portable benchmark that measures CPU and GPU compute performance of different BLAS kernels and problem configurations. From the GPU offload threshold (a BLAS kernel’s minimum dimensions for a certain configuration where using a GPU is guaranteed to yield improved performance), we evaluate the per-node performance of three, in-production, HPC systems. We show that the offload threshold for GEMM is highly dependant on problem shape and number of consecutive BLAS calls, and that, contrary to conventional wisdom, GEMV can benefit from GPU acceleration, especially on SoC-based systems.
Original language | English |
---|---|
Title of host publication | 15th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 1481-1495 |
Number of pages | 15 |
ISBN (Electronic) | 979-8-3503-5554-3 |
DOIs | |
Publication status | E-pub ahead of print - 26 Nov 2024 |
Event | 15th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2024 - Atlanta, United States Duration: 18 Nov 2024 → 18 Nov 2024 https://pmbs-workshop.github.io/ |
Conference
Conference | 15th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2024 |
---|---|
Abbreviated title | PMBS 2024 |
Country/Territory | United States |
City | Atlanta |
Period | 18/11/24 → 18/11/24 |
Internet address |
Keywords
- BLAS
- High-performance computing
- Heterogeneous computing
- Performance
- CPU
- GPU