A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application

Andrei Poenaru*, Simon Mcintosh-Smith*, Tom Lin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

2 Citations (Scopus)
27 Downloads (Pure)

Abstract

Performance portability is becoming increasingly important as next-generation high performance computing systems grow increasingly diverse and heterogeneous. Several new approaches to parallel programming have been developed in recent years to tackle this challenge, such as SYCL and Kokkos. While several studies have been published evaluating these new programming models, they have tended to focus on memory-bandwidth bound applications. In this paper we analyse the performance of the most promising modern parallel programming models, on a diverse range of contemporary high-performance hardware, using a compute-bound molecular docking mini-app. We present a mini-app for BUDE, the Bristol University Docking Engine, am application routinely used for drug discovery. We implement the mini-app in different programming models targeting both CPUs and GPUs, including SYCL and Kokkos. We then present an analysis of the performance of each implementation and compare them to highly-optimised baselines set using established programming models such as OpenMP, OpenCL, and CUDA. Our study includes a wide variety of modern hardware platforms covering CPUs based on x86 and Arm architectures, as well as GPUs. We found that, with the emerging higher-level parallel programming models framework such as SYCL, we could achieve performance comparable to that of the established models without hurting either portability or productivity. We identify a set of key challenges and pitfalls to take into account when adopting these emerging programming models, some of which are implementation-specific effects and not fundamental design errors that prevent further adoption. Finally, we discuss our findings in the wider context of performance-portable compute-bound workloads.
Original languageEnglish
Title of host publicationHigh Performance Computing - 36th International Conference, ISC High Performance 2021, Proceedings
Subtitle of host publication36th International Conference, ISC High Performance 2021, Virtual Event, June 24 – July 2, 2021, Proceedings
EditorsBradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek
PublisherSpringer
Pages332-350
Number of pages19
ISBN (Electronic)978-3-030-78713-4
ISBN (Print)978-3-030-78712-7
DOIs
Publication statusPublished - 17 Jun 2021
EventISC High Performance 2021 - Frankfurt, Germany
Duration: 24 Jun 20212 Jul 2021
https://www.isc-hpc.com/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12728 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceISC High Performance 2021
Abbreviated titleISC HPC
Country/TerritoryGermany
CityFrankfurt
Period24/06/212/07/21
Internet address

Bibliographical note

Funding Information:
This work used the Isambard UK National Tier-2 HPC Service (https://gw4.ac. uk/isambard/) operated by GW4 and the UK Met Office, and funded by EPSRC (EP/T022078/1). Access to the Cray XC50 supercomputer Swan was kindly provided 1 https://github.com/UoB-HPC/miniBUDE. 2 https://github.com/UoB-HPC/performance-portability/tree/2021-benchmarking/ benchmarking/2021/bude.

Funding Information:
The authors would like to thank Si Hammond at Sandia National Laboratories for providing short-notice results for the A64FX platform. Thank you to James Price and Matt Martineau for their original contributions towards optimised OpenMP, OpenCL, and CUDA implementations of the BUDE kernel. This study would not have been possible without previous work by the developers of the Bristol University Docking Engine: Richard Sessions, Deborah Shoemark, and Amaurys Avila Ibarra. This work used the Isambard UK National Tier-2 HPC Service (https://gw4.ac. uk/isambard/) operated by GW4 and the UK Met Office, and funded by EPSRC (EP/T022078/1). Access to the Cray XC50 supercomputer Swan was kindly provided through the Cray Marketing Partner Network. Work in this study was carried out using the HPC Zoo, a research cluster run by the University of Bristol HPC Group (https:// uob-hpc.github.io/zoo/).

Publisher Copyright:
© 2021, Springer Nature Switzerland AG.

Keywords

  • programming models
  • performance portability
  • performance analysis
  • compute-bound benchmark

Fingerprint

Dive into the research topics of 'A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application'. Together they form a unique fingerprint.

Cite this