Abstract
Many scientific codes consist of memory bandwidth bound kernels | the dominating factor of the runtime is the speed at which data can be loaded from memory into the Arithmetic Logic Units, before results are written back to memory. One major advantage of General Purpose Graphics Processing Units (GPGPUs) and other many-core devices such as the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. However, as with CPUs, this peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance.
The choice of one programming model over another should ideally not limit the performance that can be achieved on a device. GPU-STREAM has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth.
The choice of one programming model over another should ideally not limit the performance that can be achieved on a device. GPU-STREAM has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth.
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 9945 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | International Workshops on High Performance Computing, ISC High Performance 2016 and Workshop on 2nd International Workshop on Communication Architectures at Extreme Scale, ExaComm 2016, Workshop on Exascale Multi/Many Core Computing Systems, E-MuCoCoS 2016, HPC I/O in the Data Center, HPC-IODC 2016, Application Performance on Intel Xeon Phi – Being Prepared for KNL and Beyond, IXPUG 2016, International Workshop on OpenPOWER for HPC, IWOPH 2016, International Workshop on Performance Portable Programming Models for Accelerators, P^3MA 2016, Workshop on Virtualization in High-Performance Cloud Computing, VHPC 2016, Workshop on Performance and Scalability of Storage Systems, WOPSSS 2016 |
---|---|
Country/Territory | Germany |
City | Frankfurt |
Period | 19/06/16 → 23/06/16 |
Fingerprint
Dive into the research topics of 'GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models'. Together they form a unique fingerprint.Equipment
-
HPC (High Performance Computing) and HTC (High Throughput Computing) Facilities
Alam, S. R. (Manager), Eccleston, P. E. (Other), Williams, D. A. G. (Manager) & Atack, S. H. (Other)
Facility/equipment: Facility