Automatic performance tuning is becoming an increasingly valuable tool for improving performance portability when targeting diverse ranges of processor architectures. Much of the existing work to develop auto-tuning techniques focuses solely on achieving the best possible performance, with little attention paid to the amount of time required to perform the tuning process itself. As developers begin to face progressively larger sets of target platforms, the amount of tuning time required to achieve performance goals for each platform will be a crucial factor in determining the success of different auto-tuning techniques. In this work, we describe a hybrid approach to auto-tuning that combines empirical sampling and a predictive performance model, with the goal of reducing the time needed to converge on the optimal (or near-optimal) configuration. Our approach is shown to provide a three-fold reduction in the amount of tuning time required to achieve performance within 10% of the global optimum.