Abstract
FPGAs are often used in scientific fields to process graph algorithms due to their energy efficiency, reconfigurability, and fine-grained parallelism. However, these algorithms face challenges in memory access patterns, scalability, and programmability. The SYCL2020 implementation in the Intel oneAPI toolchain supports FPGA targets alongside SYCL2020 features like modern C++ with a single-source offloading to improve programmability. This study analysed the Breadth-First Search algorithm on Stratix 10 FPGA with the Intel oneAPI toolchain. The implementation was done in two phases. At first, we applied the typical optimisations proposed in the official guidelines alongside an automatic cache to achieve proper pipelining and improve random memory accesses performance. However, limitations occurred with fine-grained parallelism, and it was competitive only to some related work that utilised hardware-description languages or established high-level synthesis tools. For the second phase, we added bit-level representations of data in memory, banking in on-chip memory, and fine-grained control over parallel data streams. The second implementation was generally superior or on par with all compared designs, outperforming other works in 10 out of 15 tested datasets, including various synthetic RMAT and real-world datasets, with a peak performance of 1021 MTEPS.
Original language | English |
---|---|
Title of host publication | Proceedings of International Workshop on OpenCL and SYCL, IWOCL 2024 |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1-11 |
Number of pages | 11 |
ISBN (Electronic) | 9798400717901 |
DOIs | |
Publication status | Published - 8 Apr 2024 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Bibliographical note
Publisher Copyright:© 2024 Owner/Author.