GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models

Tom Deakin*, James Price, Matt Martineau, Simon McIntosh-Smith

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

30 Citations (Scopus)
489 Downloads (Pure)


Many scientific codes consist of memory bandwidth bound kernels | the dominating factor of the runtime is the speed at which data can be loaded from memory into the Arithmetic Logic Units, before results are written back to memory. One major advantage of General Purpose Graphics Processing Units (GPGPUs) and other many-core devices such as the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. However, as with CPUs, this peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance.

The choice of one programming model over another should ideally not limit the performance that can be achieved on a device. GPU-STREAM has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth.

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9945 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceInternational Workshops on High Performance Computing, ISC High Performance 2016 and Workshop on 2nd International Workshop on Communication Architectures at Extreme Scale, ExaComm 2016, Workshop on Exascale Multi/Many Core Computing Systems, E-MuCoCoS 2016, HPC I/O in the Data Center, HPC-IODC 2016, Application Performance on Intel Xeon Phi – Being Prepared for KNL and Beyond, IXPUG 2016, International Workshop on OpenPOWER for HPC, IWOPH 2016, International Workshop on Performance Portable Programming Models for Accelerators, P^3MA 2016, Workshop on Virtualization in High-Performance Cloud Computing, VHPC 2016, Workshop on Performance and Scalability of Storage Systems, WOPSSS 2016

Fingerprint Dive into the research topics of 'GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models'. Together they form a unique fingerprint.

Cite this