Abstract
We present a major update to the GPU-STREAM benchmark, first shown at SC’15. The original benchmark allowed comparison of achievable memory bandwidth performance through the STREAM kernels on OpenCL devices. GPU-STREAM v2.0 extends the benchmark to another dimension: the kernels are implemented in a wide range of popular state-of-the-art parallel programming models. This allows an intuitive comparison of performance across a diverse set of programming models and devices, investigating whether choice of model matters to performance and performance portability. In particular we investigate 7 parallel programming languages (OpenMP 4.x, OpenACC, Kokkos, RAJA, SYCL, CUDA and OpenCL) across 12 devices (6 GPUs from NVIDIA and AMD, Intel Xeon Phi (Knights Landing), 4 generations of Intel Xeon CPUs, and IBM Power 8).
Original language | English |
---|---|
Number of pages | 2 |
Publication status | Published - 13 Nov 2016 |
Event | 2016 International Conference for High Performance Computing, Networking, Storage and Analysis - Salt Lake City, UT, United States Duration: 13 Nov 2016 → 18 Nov 2016 http://sc16.supercomputing.org/ |
Conference
Conference | 2016 International Conference for High Performance Computing, Networking, Storage and Analysis |
---|---|
Abbreviated title | SC16 |
Country/Territory | United States |
City | Salt Lake City, UT |
Period | 13/11/16 → 18/11/16 |
Internet address |