Abstract
Real workloads on large-scale, multi-user high performance computing systems can be complex in multiple dimensions: the range of applications run on the hardware system may be diverse and could number in the hundreds or even thousands, and the application mix may vary over time. Constructing and then maintaining representative benchmark suites for such situations can be extremely challenging and can be further complicated when the systems are running codes of a confidential nature.
In this paper we present a new method which can automatically characterise any workload on a large-scale, multi-user and multi-application system. Our approach uses microarchitecture-level performance metrics, such as the number of branch mispredictions or cache misses. These low-level metrics can be gathered using standard tools on live systems running production codes with very little performance overhead and with no change to any of the codes being analysed. Our method uses microarchitecture-level metrics to construct a statistical model of the real workload. In a second step, a set of ‘analogue’ benchmarks are also profiled using the same set of microarchitecture performance metrics. In the final step, an automated process constructs a benchmark workload from the set of simple analogue benchmarks. This ‘analogue workload’ closely approximates the real workload in terms of its statistical behaviour on the hardware and can be used for subsequent relative performance benchmarking.
In this paper we present a new method which can automatically characterise any workload on a large-scale, multi-user and multi-application system. Our approach uses microarchitecture-level performance metrics, such as the number of branch mispredictions or cache misses. These low-level metrics can be gathered using standard tools on live systems running production codes with very little performance overhead and with no change to any of the codes being analysed. Our method uses microarchitecture-level metrics to construct a statistical model of the real workload. In a second step, a set of ‘analogue’ benchmarks are also profiled using the same set of microarchitecture performance metrics. In the final step, an automated process constructs a benchmark workload from the set of simple analogue benchmarks. This ‘analogue workload’ closely approximates the real workload in terms of its statistical behaviour on the hardware and can be used for subsequent relative performance benchmarking.
Original language | English |
---|---|
Title of host publication | ACM SIGMETRICS Performance Evaluation Review |
Publisher | Association for Computing Machinery (ACM) |
Pages | 17-18 |
Number of pages | 2 |
ISBN (Print) | 9781450311021 |
DOIs | |
Publication status | Published - Nov 2011 |
Event | Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS11) - Seattle, United States Duration: 12 Nov 2011 → 18 Nov 2011 |
Conference
Conference | Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS11) |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 12/11/11 → 18/11/11 |