Abstract
Performance portability has rapidly become one of the key concerns for application developers targeting modern computer architectures. Although there are various programming models that can offer functional portability when moving application code between different devices, it remains an open research question as to whether it is possible to guarantee some degree of performance portability in these situations. Automatic performance tuning approaches have been shown to be effective tools for removing the burden of code optimization from the developer, but somewhat sidestep the issue of performance portability by enabling an environment where code is repeatedly optimized for each architecture individually.
In this work, we present an in-depth analysis of the performance portability of code that has been highly optimized for specific devices via auto-tuning. We perform this analysis across a wide range of modern, many-core architectures from multiple hardware vendors, examining performance portability both across different vendors and between devices from the same vendor. We then demonstrate how the auto-tuning process can be modified to bring performance portability into the equation, in order to automatically generate a single implementation that achieves high efficiency across many different devices.
In this work, we present an in-depth analysis of the performance portability of code that has been highly optimized for specific devices via auto-tuning. We perform this analysis across a wide range of modern, many-core architectures from multiple hardware vendors, examining performance portability both across different vendors and between devices from the same vendor. We then demonstrate how the auto-tuning process can be modified to bring performance portability into the equation, in order to automatically generate a single implementation that achieves high efficiency across many different devices.
Original language | English |
---|---|
Title of host publication | High Performance Computing - ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^3MA, VHPC, Visualization at Scale, WOPSSS, Revised Selected Papers |
Publisher | Springer, Cham |
Pages | 538-556 |
Number of pages | 19 |
ISBN (Print) | 9783319676296 |
DOIs | |
Publication status | Published - 20 Oct 2017 |
Event | 32nd International Conference on High Performance Computing, ISC High Performance 2017 - Frankfurt, Germany Duration: 18 Jun 2017 → 22 Jun 2017 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10524 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 32nd International Conference on High Performance Computing, ISC High Performance 2017 |
---|---|
Country/Territory | Germany |
City | Frankfurt |
Period | 18/06/17 → 22/06/17 |
Keywords
- performance portability
- auto-tuning
- GPGPU
- OpenCL