Evaluating attainable memory bandwidth of parallel programming models via BabelStream

Tom Deakin, James Price, Matt Martineau, Simon McIntosh-Smith

Research output: Contribution to journalSpecial issue (Academic Journal)peer-review

30 Citations (Scopus)
664 Downloads (Pure)

Abstract

Many scientific codes consist of memory bandwidth bound kernels — the
dominating factor of the runtime is the speed at which data can be loaded from
memory into the Arithmetic Logic Units, before results are written back to memory. One major advantage of many-core devices such as General Purpose Graphics Processing Units (GPGPUs) and the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. However, as with CPUs, this peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance. We augment the standard set of STREAM kernels with a dot product kernel to investigate the performance of simple reduction operations on large arrays. Such kernels are usually present in scientific codes and are still memory bandwidth bound. The choice of one programming model over another should ideally not limit the performance that can be achieved on a device. BabelStream (formally GPU-STREAM) has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array
of results of achievable memory bandwidth.
Original languageEnglish
Pages (from-to)247-262
Number of pages16
JournalInternational Journal of Computational Science and Engineering
Volume17
Issue number3
Early online date22 Oct 2018
DOIs
Publication statusPublished - Oct 2018

Keywords

  • Performance portability
  • Many-core
  • Parallel programming models
  • Memory bandwidth benchmark

Fingerprint

Dive into the research topics of 'Evaluating attainable memory bandwidth of parallel programming models via BabelStream'. Together they form a unique fingerprint.

Cite this