Abstract
The Barnes-Hut approximation for N-body simula tions reduces the time complexity of the naive all-pairs approach from O(N 2 ) to O(N log N) by hierarchically aggregating nearby particles into single entities using a tree data structure. This inherently irregular algorithm poses substantial challenges for performance portable implementations on multi-core CPUs and GPUs. We introduce two portable fully-parallel Barnes-Hut implementation strategies that trade-off different levels of GPU support for performance: an unbalanced concurrent octree, and a balanced bounding volume hierarchy sorted by a Hilbert space filling curve. We implemente these algorithms in portable ISO C++ using parallel algorithms and concurrency primitives like atomics. The results demonstrate competitive performance on a range of CPUs and GPUs. Additionally, they highlight the effectiveness of the par execution policy for highly concurrent irregular algorithms, outperforming par_unseq on CPUs and GPUs with Independent Forward Progress.
| Original language | English |
|---|---|
| Title of host publication | 14th Workshop on Irregular Applications |
| Subtitle of host publication | Architectures and Algorithms co-located with IEEE/ACM Supercomputing |
| Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
| Pages | 708-717 |
| ISBN (Print) | 979-8-3503-5554-3 |
| DOIs | |
| Publication status | E-pub ahead of print - 26 Nov 2024 |
| Event | Workshop on Irregular Applications: Architecture and Algorithms - Atlanta, United States Duration: 17 Nov 2024 → 17 Nov 2024 https://hpc.pnl.gov/IA3/ |
Publication series
| Name | SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis |
|---|---|
| Publisher | IEEE |
Conference
| Conference | Workshop on Irregular Applications |
|---|---|
| Abbreviated title | IA3 |
| Country/Territory | United States |
| City | Atlanta |
| Period | 17/11/24 → 17/11/24 |
| Internet address |