Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model

Matt Martineau, Simon McIntosh-Smith, Wayne Gaudin

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

21 Citations (Scopus)
301 Downloads (Pure)

Abstract

Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performance of Cray's new NVIDIA GPU targeting implementation of OpenMP 4.0, with the mini-apps TeaLeaf, CloverLeaf and BUDE. We successfully port each of the applications, using a simple and consistent design throughout, and achieve performance on an NVIDIA K20X that is comparable to Cray's OpenACC in all cases. BUDE, a compute bound code, required 2.2x the runtime of an equivalently optimised CUDA code, which we believe is caused by an inflated frequency of control flow operations and less efficient arithmetic optimisation. Impressively, both TeaLeaf and CloverLeaf, memory bandwidth bound codes, only required 1.3x the runtime of hand-optimised CUDA implementations. Overall, we find that OpenMP 4.0 is a highly usable open standard capable of performant heterogeneous execution, making it a promising option for scientific application developers.
Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages338-347
Number of pages10
ISBN (Electronic)9781509021406
DOIs
Publication statusPublished - 4 Aug 2016
Event30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 - Chicago, United States
Duration: 23 May 201627 May 2016

Conference

Conference30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
CountryUnited States
CityChicago
Period23/05/1627/05/16

Keywords

  • Application programming interfaces
  • High performance computing
  • OpenMP
  • Parallel computing
  • Performance portability

Fingerprint Dive into the research topics of 'Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model'. Together they form a unique fingerprint.

Cite this