Recent trends in computational architecture design are yielding processors with deep and complex memory hierarchies consisting of small capacity caches and large capacity main memory. CPU parallelism is also hierarchical, consisting of SIMD vector units contained within multiple computational cores with one or more packages in a multi-socket system. Solving the deterministic discrete ordinates transport equation effectively on these architectures requires extracting and effectively mapping concurrent work to the processing elements to leverage performance close to the maximum attainable. This challenge becomes more acute when an unstructured spatial domain is required, where the sweep dependency between neighbouring spatial cells/elements is not implicit as for a structured grid. In this paper we introduce the transport community to the UnSNAP mini-app, a port of the well known SNAP proxy application. UnSNAP was developed to investigate the performance of arbitrarily high-order discontinuous Galerkin finite element unstructured deterministic transport codes on advanced architectures. Approaches to local matrix assembly and solution are evaluated in order to assess their performance for different element orders, and discuss the trade-offs with respect to performance and memory capacity limits of advanced architectures. The performance limiting factors will be explored on many-core architectures, including CPUs from Intel, AMD and Marvell (Arm). We will also discuss performing unstructured sweeps on GPU devices highlighting the associated challenges.
|Title of host publication||Proceedings of The International Conference on Mathematics and Computational Methods applied to Nuclear Science and Engineering|
|Subtitle of host publication||M&C2019|
|Publisher||American Nuclear Society|
|Publication status||Published - 25 Aug 2019|