Abstract
In the modern world, embedded computing is becoming increasingly important. Pro-cessors are being used in a range of devices such as mobile phones, digital cameras, and personal digital assistants. In these applications, processors must be small, light-weight, cheap to manufacture, and consume little power, yet still deliver the performance necessary to cope with high-bandwidth multi-media data applications. Designing and building proces-sors capable of achieving these conflicting goals is a considerable technical challenge. Parallelism is an important tool in building high performance processors. General pur-pose processors typically exploit instruction level parallelism (ILP), based upon superscalar or VLIW technology. Superscalar processors are widely used in general purpose applica-tions, but are too large, complex and power hungry to be suitable for embedded processors. Instead, VLIW processors are normally used in embedded processing, due to their relative simplicity, and high performance. However, they suffer from problems such as architectural sensitivity, and poor tolerance to latency. It is being recognised that the exploitation of ILP is becoming ever harder to achieve. An alternative is to support thread level parallelism (TLP). This is generally easier than exploiting ILP, since different threads contain virtually no dependencies between them, making it easy to concurrently execute instructions taken from different threads. Multi-threaded processors are the main technology for supporting TLP. Unfortunately, multi-threaded processors suffer from poor single-thread performance. To combat this, some recent processors support both ILP and TLP, but this makes them difficult and expensive to build, and they are unsuitable for embedded processors. In this thesis, we discuss a technique called Uniform heterogeneous multi-threading (UHM). This technique uses the concept of threads to represent all types of parallelism, from conventional TLP through to ILP. A thread is a sequential list of instructions, possibly containing loops and branches. Threads can be of any length, from millions of instruc-tions, down to a single instruction. By allowing such fine grain threads to be created, ILP can be exploited by concurrently executing these threads. Such fine-grained threads must synchronise and communicate within one or a few processor cycles, and this requires a combination of special processor hardware and compiler techniques. The compiler must be able to statically schedule individual instructions into a fixed number of threads, which are then executed by a specially enhanced multi-threaded processor. The processor allows threads to synchronise and communicate data within a single cycle. The UHM processor is substantially simpler than a superscalar processor, and would be suitable for use as an embedded processor. Furthermore, the design of the UHM processor is scalable, and can be easily
Translated title of the contribution | The Uniform Heterogeneous Multi-threaded Processor Architecture |
---|---|
Original language | English |
Publication status | Published - 2002 |