We present a major update to the GPU-STREAM benchmark, first shown at SC’15. The original benchmark allowed comparison of achievable memory bandwidth performance through the STREAM kernels on OpenCL devices. GPU-STREAM v2.0 extends the benchmark to another dimension: the kernels are implemented in a wide range of popular state-of-the-art parallel programming models. This allows an intuitive comparison of performance across a diverse set of programming models and devices, investigating whether choice of model matters to performance and performance portability. In particular we investigate 7 parallel programming languages (OpenMP 4.x, OpenACC, Kokkos, RAJA, SYCL, CUDA and OpenCL) across 12 devices (6 GPUs from NVIDIA and AMD, Intel Xeon Phi (Knights Landing), 4 generations of Intel Xeon CPUs, and IBM Power 8).
|Number of pages||2|
|Publication status||Published - 13 Nov 2016|
|Event||2016 International Conference for High Performance Computing, Networking, Storage and Analysis - Salt Lake City, UT, United States|
Duration: 13 Nov 2016 → 18 Nov 2016
|Conference||2016 International Conference for High Performance Computing, Networking, Storage and Analysis|
|City||Salt Lake City, UT|
|Period||13/11/16 → 18/11/16|