Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many computational problems exhibit little or no parallelism and many existing formulations are sequential. Therefore, it is essential that highly parallel architectures can support sequential computation by emulating large memories with collections of smaller ones, thus supporting efficient execution of sequential programs or sequential algorithms included as part of parallel programs. This paper presents a novel tiled parallel architecture which can scale to thousands of processors per-chip and can deliver this ability. Provision of an interconnect with scalable low latency communications is essential for this and the realistic construction of such a system with a high-degree switch and a Clos-based network is presented. Experimental evaluation shows that sequential programs can be executed with only a factor of 2 to 3 slowdown when compared to a conventional sequential machine and that the area is roughly only a factor of two larger. This seems an acceptable price to pay for an architecture that can switch between executing highly parallel programs and sequential programs with large memory requirements.
|Publication status||Published - 3 Oct 2012|