Trend locally stationary wavelet processes

Most time series observed in practice exhibit first‐ as well as second‐order non‐stationarity. In this article we propose a novel framework for modelling series with simultaneous time‐varying first‐ and second‐order structure, removing the restrictive zero‐mean assumption of locally stationary wavelet processes and extending the applicability of the locally stationary wavelet model to include trend components. We develop an associated estimation theory for both first‐ and second‐order time series quantities and show that our estimators achieve good properties in isolation of each other by making appropriate assumptions on the series trend. We demonstrate the utility of the method by analysing the global mean sea temperature time series, highlighting the impact of the changing climate.


INTRODUCTION
In many contexts, it is common to encounter time series for which the mean and autocovariance of the series vary over time. Examples of time series that exhibit such non-stationary first-and second-order behaviour span a broad range of applications, including financial time series (Fryzlewicz, 2005), climate data (Beaulieu and Killick, 2018), and meteorology (Shen, 2015). Consequently, methodology that can incorporate both first-and second-order non-stationarity within the same modelling framework is appealing in numerous practical scenarios. Often, however, analysis is only performed on either first-or second-order properties, rather than both. In this article, we propose a two-stage wavelet-based framework to estimate the trend -the smooth, long-term behaviour -and the time-varying second-order structure of a time series.
Due to their many advantageous properties, wavelet methods have enjoyed popularity for many years in time series analysis. In particular, wavelets' localised nature can provide sparse location-scale signal decompositions; for a recent overview of wavelet techniques, see for example Vidakovic (2009) or Nason (2008). Recently, there has been much focus in the area of locally stationary wavelet (LSW) modelling, introduced in the seminal work of Nason et al. (2000). The LSW framework provides a modelling approach for time series whose second-order structure evolves slowly over time. The model has been successfully applied for a number of tasks in time series analysis, including forecasting , changepoint analysis (Killick et al., 2013), stationarity testing (Nason, 2013) and clustering/classification (Hargreaves et al., 2018;Wilson et al., 2019). This particular approach to modelling has shown to provide improved analytic insights in areas such as medicine (Sanderson et al., 2010;Park et al., 2014), biology (Hargreaves et al., 2019) and environmental science (Chapman et al., 2020). A comprehensive overview of second-order non-stationary time series modelling can be found in Dahlhaus (2012). where S(z) ∶= {S j (z)} j=−1, … ,−J . The raw wavelet periodogram is first smoothed and then corrected by A −1 J to produce an asymptotically unbiased, consistent estimator. Smoothing can be carried out using a number of techniques, for example, via a running mean as in Nason (2013) or using wavelet thresholding as in Nason et al. (2000).
The local autocovariance (LACV) function for an LSW process provides information about the covariance at a rescaled location z = k∕T ∈ (0, 1). The LACV, c(z, ), of an LSW process with EWS {S j (z)} is defined as c(z, ) = ∑ −1 j=−∞ S j (z)Ψ j ( ), ∈ Z, z ∈ (0, 1). The LACV can be thought of as a decomposition of the autocovariance of a process over scale and rescaled time. The process autocovariance arises as the asymptotic limit of the LACV. In practice, the LACV is estimated by plugging in the smoothed, corrected estimate for the EWS into the equation for the LACV, which results in a consistent estimator.

Modelling Locally Stationary Time Series with Trend
We introduce our proposed model, which relaxes the zero mean assumption inherited from Assumption 1 in Definition 2.1 above. We will show that it is possible to incorporate a non-stationary polynomial trend term within the context of a second-order non-stationary time series. Furthermore, we will show that this polynomial assumption can be relaxed, facilitating modelling of time series with more complex first-order structure within our paradigm. This allows for self-contained modelling of both first-and second-order non-stationarity of a time series. Hence, it is not necessary to remove the trend before estimating second-order structure, and vice versa, we can obtain an estimate of first-order structure in the presence of the non-stationary second-order behaviour. To this end, we define the polynomial trend locally stationary wavelet process as follows.
Definition 2.2. A polynomial trend LSW process {X t,T }, t = 0, … , T − 1, and T = 2 J is a doubly-indexed stochastic process with the following representation in the mean square sense: where the quantities in representation (2.5) possess the same properties as in Definition 2.1, and in addition the function is a polynomial of degree p, that is, Our model imposes the same assumptions on the LSW component as in Nason et al. (2000), allowing for locally stationary second-order structure, while also permitting non-stationary first-order behaviour by incorporating a polynomial mean function. Using the model defined in Equation (2.5), we build on the rigorous theory of Nason et al. (2000) to estimate the non-stationary first-and second-order structure of the time series. Note that the assumption that T = 2 J is a theoretical one; in practice, using the boundary handling method described in Section 4.4, we can analyse time series of any length.
3. SPECTRAL ESTIMATION THEORY 899 time series does not display a trend. We extend this work to allow for the inclusion of a trend by examining three separate scenarios: first, we examine the case where the trend is a polynomial of degree less than the order of the discrete wavelet used in the LSW process (2.5); second, the case where the trend is a higher degree than the discrete wavelet; and lastly, the case of a general Hölder continuous trend.
Using discrete wavelets to model the second-order non-stationarity allows the incorporation of smooth trends due to the ability of the wavelet to 'cancel out' this trend. This means we circumvent the problem of detrending the series first, which is challenging due to the second-order non-stationarity. In all three scenarios, we provide appropriate theoretical results to enable consistent estimation of both the EWS and local autocovariance of the time series. Given that trends are usually assumed to be slowly-varying, smooth functions, such as low-order polynomials (see, e.g. Priestley, 1983;Craigmile et al., 2005;Kallache et al., 2005), our model in (2.5) is applicable to a wide range of data scenarios. In practice, this can serve as a good approximation even when we depart from the given assumptions.

Spectral Estimation with Low-Order Polynomial Trends
First, we examine the case of a low-order polynomial trend. We show that we can use the same estimation procedure as in Nason et al. (2000) to obtain an asymptotically unbiased, consistent estimator of the EWS. This can then be used to obtain a consistent estimator of the LACV.
Recall that a function ∈ L 2 (R) is said to have m vanishing moments if it satisfies ⟨x l , (x)⟩ = ∫ x l (x)dx = 0, for l = 0, 1, … , m − 1. Denote by v( ) the number of vanishing moments of the discrete wavelet . If v( ) = m, then all wavelet coefficients of any polynomial of degree m − 1 or less will be zero. The commonly used extremal phase (EP) and least asymmetric (LA) Daubechies wavelet families have k vanishing moments for a filter of length 2k. For example, the Haar wavelet has one vanishing moment and can only annihilate constant behaviour, while the Daubechies EP wavelet with 10 vanishing moments can annihilate polynomial behaviour up to degree 9.
We henceforth refer to the discrete wavelet defined in the LSW process representation in Equation (2.5) as the generating wavelet. Since second-order estimation is performed using the generating wavelet coefficients, estimation will be unaffected by the trend, provided that p < v( ). Hence, we can obtain an unbiased estimate of the spectrum in the usual way.
Lemma 3.1. Suppose that p < v( ). Then, the expectation and variance of the non-boundary raw wavelet periodogram is given by Equations (2.2) and (2.3) respectively.
We can correct the raw wavelet periodogram estimator in the usual way with A −1 J as in Equation (2.4) to obtain an asymptotically unbiased estimator, just as in the original LSW model. Hence, as long as the trend of the time series can be represented by a low-order polynomial and we use a high enough order discrete wavelet, we can still apply the standard EWS estimation procedure without incurring any additional bias.

Remark 1.
In practice, only the non-boundary wavelet coefficients will be free from trend, while the boundary wavelet coefficients will still contain the trend. This is due to the way in which the non-decimated discrete wavelet transform (DWT) is computed at the boundary values of the time series, see, for example, section 4.11 of Percival and Walden (2006). In Section 4.4, we address this issue by modifying standard boundary handling methods for estimation at the boundaries of the time series.
As in the original LSW model, resulting spectral estimates are inconsistent and must be smoothed. To achieve consistency, we can simply follow the smoothing strategy taken in Nason et al. (2000): for each fixed scale j, the periodogram I j k,T (which is scaled 2 -distributed) is smoothed as a function of z = k∕T using, for example, DWT shrinkage or translation invariant (TI) denoising of Coifman and Donoho (1995). The following two results, which hold for Gaussian LSW processes (and in turn polynomial trend Gaussian LSW processes), describe how consistency is achieved through DWT shrinkage.
Work in von Sachs et al. (1997) explains how to smooth using an orthonormal second stage wavelet basis of L 2 ([0, 1]), denoted {̃r s }. Smoothing is achieved using nonlinear thresholding of the empirical wavelet coefficients, 900 MCGONIGLE, KILLICK, AND NUNES û rs , of I j (z) and then inverting using A −1 to give the estimateŜ j (z). This leads to an extension of theorem 4 and proposition 5 of Nason et al. (2000) -which are only applicable for Haar and Shannon wavelet zero-mean LSW processes -to trend LSW processes generated by any discrete wavelet in the Daubechies compactly supported family.
Proposition 3.2. Let̃be a discrete wavelet of bounded variation, with 2 r = o(T) for wavelet coefficients û rs . Suppose that S j (k∕T) ≤ D2 5j∕6 for some constant D. For a polynomial trend Gaussian LSW process generated by any Daubechies compactly supported discrete wavelet, with threshold given by 2 (j, r, s, T) = Var(û rs )log 2 (T), for each fixed j, Lastly, we can consistently estimate the LACV by using the EWS estimate.
Proposition 3.3. Suppose the assumptions of Proposition 3.2 hold, and defineĉ(z, ) by replacing S j (z) withŜ j (z) in the equation for the LACV, that is,ĉ Then, for each fixed ∈ Z,ĉ(z, ) satisfies Propositions 3.2 and 3.3 show that in the case of a low-order polynomial trend, we can still consistently estimate the evolutionary wavelet spectrum and local autocovariance on the non-boundary values of the wavelet periodogram. The assumption on the decay rate of the EWS, S j ≤ D2 5j∕6 , is necessary for the proof of Proposition 3.2, and hence Proposition 3.3 as well. This assumption is, for example, satisfied for the white noise process, where S j = 2 j . The assumption is required to control the mean squared error in estimating the EWS using the J available scales, as we do not have access to all (infinite) scales in practice; it is chosen to balance the mean squared error rate (up to the log factor) of (log 2 (T)T −2∕3 ) that arises from the wavelet thresholding procedure. This is a slightly weaker assumption than that of Fryzlewicz and Nason (2006) and Sanderson et al. (2010), who assume that S j ≤ D2 j . Note that no explicit assumption on S j is made in the original work of Nason et al. (2000). However, the modification to the original LSW model made in Fryzlewicz (2003), the accepted definition in the LSW literature, would necessitate the same assumption that we have here.
The extension of consistency results to all Daubechies compactly supported discrete wavelets is in itself an important contribution to the area of LSW analysis, and is due to a new result on properties of the autocorrelation wavelet inner product matrix A, given in Proposition 3.2 in the Supporting Information. This proof can be used to extend much of the current theory of LSW processes -which previously was only valid for Haar and Shannon wavelets -to all compactly supported Daubechies wavelets. This includes, for example, the work on bivariate LSW processes of Sanderson et al. (2010), and the extension of the LSW model to spectra of bounded total variation of Van Bellegem and von Sachs (2008).

Spectral Estimation with High-Order Polynomial Trends
Next, we consider the case of higher-order polynomial trends with respect to the generating wavelet, which allows for the modelling of less smooth trends. The key idea is to use a smoother discrete wavelet to analyse the series, which ensures that the trend is removed from the wavelet coefficients. By calculating the bias that this causes, we can correct the raw wavelet periodogram analogously to Equation (2.4) to obtain an unbiased estimate.
If the polynomial trend of the time series is a higher degree than the discrete wavelet generating the LSW process, then the standard non-decimated discrete wavelet transform will not zero out the trend, causing bias in our estimation. For example, if the Haar wavelet is the generating wavelet, and the time series exhibits a linear trend, then additional bias will be incurred in the computation of the raw wavelet periodogram, because the Haar wavelet is only capable of annihilating constants. If we instead use the Daubechies EP wavelet with two vanishing moments, the linear trend would be removed by the discrete wavelet. Then, the bias of the raw wavelet periodogram will be quantified in terms of the mismatch between the generating wavelet and analysing wavelet. The next result shows we can correct for this bias, and extends the work of Gott and Eckley (2013), which examined the effect of wavelet misspecification on EWS estimation.
Lemma 3.4. Suppose that the polynomial LSW process is generated by discrete wavelet 0 , and let 1 be a Daubechies compactly supported discrete wavelet with p < v( 1 ). Define the non-decimated discrete wavelet coefficients to be d j,k = ∑ t X t 1 j,k−t . Let the matrix C 1,0 be the inner product matrix of autocorrelation wavelets whose j, lth entry is given by where the superscript denotes the fact that the underlying discrete wavelet is different. Then, Provided that C 1,0 J is invertible, Lemma 3.4 implies that, for the vector of periodograms I(z) ∶= Hence, we do not have to restrict to assuming the trend has a lower order than the discrete wavelet that generates the LSW process. In this scenario, we can perform the following altered estimation of the EWS. First, we apply the non-decimated discrete wavelet transform using a higher order wavelet, removing the trend. Next, we form the periodogram and perform smoothing. Finally, we correct the estimate by premultiplying the appropriate inverse matrix (C 1,0 ) −1 . Correcting in this fashion can be beneficial even when the time series contains no trend. Smoother wavelets have a shorter support in the Fourier domain and therefore experience less spectral leakage, in which power can leak into nearby scales. For example, a Haar wavelet estimate will experience more leakage than using a smoother discrete wavelet like the EP10 wavelet. Therefore, the off-diagonal entries of the Haar A matrix exhibit slower decay than that of the EP10 A matrix. Hence, the off-diagonal entries of the corresponding C matrix will be smaller than those of the less smooth discrete wavelet.
If the LSW process is generated by a Haar wavelet and we use a Haar wavelet to analyse the series, then the spectral estimate may be prone to leakage and the accuracy of the estimate will suffer as a result. However, if the process is generated by an EP10 wavelet and we use the EP10 wavelet to analyse, the estimate will exhibit less leakage. Therefore, it can be beneficial, in the case where the generating wavelet is not very smooth, to use a smoother wavelet to analyse, and then correct the estimate with the corresponding C-inverse matrix, rather than using the generating wavelet to analyse and the usual A-inverse matrix. Using the smoother wavelet to analyse helps to reduce leakage, while still achieving an unbiased estimate by correcting in the appropriate way. This leads to a more accurate spectral estimate: in the Supporting Information, we provide numerical experiments to verify this observation.
Intuitively, there is strong evidence to suggest that the matrix C 1,0 is invertible. By Proposition 3.2 in the Supporting Information, we show that the inverse of the operator A is bounded for all Daubechies compactly supported discrete wavelets. The A-matrix is the Gram matrix of the set {Ψ j ( )} j≤−1 of linearly independent autocorrelation wavelets. In our case, the matrix C 1,0 is the cross-Gram matrix of two sets of autocorrelation wavelets {Ψ 0 j } j≤−1 and {Ψ 1 j } j≤−1 . Unfortunately, the properties of Gram matrices do not necessarily carry over to cross-Gram matrices. This means we cannot directly adapt the proof of the invertibility of A to the invertibility of C. Intuitively, forming the cross-Gram matrix uses a mix of two discrete wavelet families, and we might expect that the properties of the cross-Gram matrix are a 'mixture' or 'average' of the properties of the two Gram matrices associated to the two wavelet families.
For example, when J = 10, the numerically calculated condition number of the Haar A matrix is 860.47, while it is equal to 393.12 for the Daubechies extremal phase discrete wavelet with 10 vanishing moments. For the two C matrices associated with these two families, the condition numbers are 596.97 and 610.97. These numbers are close to the average of the two A matrix condition numbers. This suggests that the C matrices should also be invertible, with bounded inverse.

Spectral Estimation with General Smooth Trends
A polynomial trend may not be an appropriate assumption for a given time series. We investigate how we can modify the spectral estimation procedure when we place a Hölder continuous smoothness assumption on the trend. Often, wavelet methods for curve estimation (distinct from trend estimation where more general mean functions are considered) assume that the mean function lies in some Besov space (see e.g. von Sachs and MacGibbon, 2000;Neumann and von Sachs, 1995), of which Hölder continuous functions are a special case.
Intuitively, the discrete wavelet coefficients are calculated using difference operations, and so if the trend function is sufficiently smooth, we expect that these discrete wavelet coefficients will be reasonably small. When using these coefficients to estimate the EWS -in some of the finer scales at least -where we use less of the data to calculate the wavelet coefficients, and where local changes are small, we could expect the estimate to remain reasonably accurate.
Formally, consider the class of Hölder continuous functions with exponent , where 0 < ≤ 1, in place of the polynomial trend assumption in Definition 2.2. For example, (z) = z is Hölder continuous with exponent , and an exponent of = 1 corresponds to Lipschitz continuity. Again, this seems a reasonable assumption to make, given that trend functions are generally assumed to be smooth and slowly evolving -for example, sinusoids are Lipschitz continuous. As shown in Lemma 3.5, we accumulate bias in the raw wavelet periodogram proportional to the level of the discrete wavelet transform and the smoothness of the trend, which is governed by .
Lemma 3.5. For any Hölder continuous trend LSW process with exponent , the expectation of the raw wavelet periodogram is given by Thus we introduce extra bias due to the trend, which we must account for in our estimation procedure, that is, when smoothing and correcting. At coarser scales, the bias caused by the trend accumulates in the squared wavelet coefficients. Analogous behaviour occurs when performing Fourier-based spectral analysis. This is characterised by the phenomenon of a peak in the Fourier periodogram near the zero frequency, as discussed in Dalla et al. (2020). To avoid the accumulation of bias, one could apply differencing to the time series first to remove the trend. However, differencing can vastly alter the spectral structure of the time series. A standard LSW analysis performed on the differenced series may not be appropriate, and the differencing operation should be taken 903 into account in the estimation procedure. A thorough investigation of a difference-based approach can be found in McGonigle et al. (2021).
To smooth the wavelet periodogram, we propose to use a simple running mean, as in Nason (2013) and Sanderson et al. (2010), with bin width size 2n + 1. The smoothed wavelet periodogram is thus defined as (3.4) For simplicity, we use a running mean, but one could just as easily use wavelet thresholding, and achieve similarly consistent estimation. We must now correct the smoothed wavelet periodogram using the inverse matrix A −1 . When we do this, we alter the standard estimation procedure by only correcting across a certain number of scales -in relation to T -to control for the bias caused by the trend and the smoothing step. One can view this as akin to a tapered estimator, which we utilise for both EWS and LACV estimation. Based on the above observations, we can now state our results related to the consistency of the smoothed estimators of the EWS and local autocovariance in the presence of a Hölder continuous trend.
Theorem 3.6. Suppose that S j (k∕T) ≤ D2 j for some constant D, and is Hölder continuous with exponent , and let J 0 = log 2 (T) for ∈ (0, 1). Then, uniformly in k, the EWS estimatorŜ j (k∕T), defined bŷ is mean square consistent for each fixed scale j, provided that n −1 T → 0, nT −1 → 0 and T 4 ( −1)+ → 0 as T → ∞ and n → ∞. The mean squared error ofŜ j (k∕T) is given by Theorem 3.7. Suppose the assumptions of Theorem 3.6 hold. LetŜ j (k∕T) be the estimator of the EWS defined in Equation (3.5). Then, for each fixed ∈ Z, the estimator of the local autocovariance c(k∕T, ), defined bŷ is mean square consistent, provided that n −1 T → 0, nT −1 → 0 and T 4 ( −1)+ → 0 as T → ∞ and n → ∞. The mean squared error ofĉ(k∕T, ) is given by Note that these results hold for all Daubechies compactly supported discrete wavelets, just as in Propositions 3.2 and 3.3. In the same manner as Propositions 3.2 and 3.3, we require an assumption on the decay rate of the EWS, which is necessary to ensure consistent estimation. For this weaker assumption of general smooth trends, the running mean smoother requires the slightly stronger assumption that S j ≤ D2 j , as in Sanderson et al. (2010). The third error term, corresponding to the error caused by the trend, will be asymptotically dominated by the first term, irrespective of the bin width n, for = 1 and < 3∕4. Therefore, the trend term can be seen to have minimal impact on the spectral estimation, provided it is smooth enough. In the presence of a smooth trend, we can still recover a consistent estimator of the EWS and LACV. We simply alter the standard estimation procedure by only using the finest J 0 scales. Further, one could just as easily use a different analysing wavelet to the generating wavelet as described in Section 3.2, and use the appropriate (C 1,0 ) −1 matrix for bias correction.
Choice of contributing scales and bin width. When using the tapered estimator, we must choose a value of , the proportion of scales over which we correct. Higher values of ensure a decomposition over a larger number of scales. In practice, not all scales will be informative, and the number of informative scales will depend on both the data and choice of discrete wavelet, as noted in Sanderson et al. (2010). In our experiments, we have observed no significant bias incurred by using most scales, and propose using the proportion = 2∕3. This is motivated by results in the next section on trend estimation and is in alignment with Sanderson et al. (2010), who also utilise running mean smoothing for LSW spectral estimation. We recommend the choice = 2∕3 as a balance between boundary effects, bias in coarse scales occurring due to trend, and capturing the coarse-scale behaviour of the time series. If the practitioner is interested in very coarse-scale behaviour, then provided the trend is not too pronounced, a higher value of can be used along with a wavelet with shorter filter length. Different values of can also be compared visually to assess the fit.
The choice of the bin width parameter n in Equation (3.4) will also affect the quality of the estimate. We can choose to use level-dependent smoothing of the periodogram, and so a larger bin width is used in coarser scales. Choosing a bin width in this way will still ensure consistent estimation, and further is natural as coarser scales will experience stronger autocorrelation.
The results in this section establish that we can still consistently estimate the non-stationary second-order structure in the presence of a trend, provided that the trend function is smooth. In practice, using a smoother discrete wavelet to analyse the series is recommended, as it is able to better remove the trend of the series, while also helping to reduce leakage. Smoothing of the wavelet periodogram can be carried out using wavelet thresholding or a running mean smoother. We advocate the latter due to its simplicity and because only one parameter needs to be specified.
Finally, we note that many existing methods that utilise the LSW framework immediately extend to our new model. For example, the extension of the LSW model to spectra of bounded total variation in Van Bellegem and von Sachs (2008) is valid within our model. Furthermore, the test of second-order stationarity of Nason (2013) will still be applicable, provided the discrete wavelet used is smooth enough to remove the trend of the time series. Similarly, locally stationary wavelet models for multi-variate time series, such as Sanderson et al. (2010) and Park et al. (2014), will be valid for any trend scenario described above.

TREND ESTIMATION
Having described the necessary methodology for estimating the non-stationary second-order structure of the time series, we now address the question of estimating the trend of the series. If we assume that the trend function is smooth, a natural method for estimating it is to use wavelets, given that polynomial functions can be exactly represented using wavelet scaling functions (Vidakovic, 2009). Using a discrete wavelet-based approach enables us to obtain a simple intuitive estimator from which we can derive appropriate confidence intervals.

Trend Estimation Theory
We follow the same strategy as Craigmile et al. (2004) and Craigmile et al. (2005), who apply wavelet thresholding in the long memory time series setting. Our problem can be posed as the following. The observed data vector X = (X 0 , … , X T−1 ) T is given by where is the trend component and is the LSW process term. If we use a discrete wavelet that has a high enough order with respect to the trend, the discrete wavelet transform of the time series will decompose it into a trend and noise component. This is because the non-boundary wavelet coefficients of a polynomial are zero due to the vanishing moments property, and so since the increments in an LSW process are zero mean, these coefficients have expectation zero. The boundary wavelet coefficients and the scaling coefficients will contain a contribution due to the trend alone. Motivated by this observation, we can estimate the trend using a type of wavelet thresholding estimator. We perform the discrete wavelet transform on the time series, choosing the coarsest scale j 0 to which we analyse the series. Then, we set the non-boundary wavelet coefficients to zero. Then, we perform the inverse discrete wavelet transform to obtain the estimator of the trend. This estimator captures the long-term behaviour of the time series (trend), but will also contain short-scale variability at the boundaries due to the inclusion of boundary wavelet coefficients.
Formally, let us write the discrete wavelet transform of the time series in matrix form, from which we can derive the matrix form of the estimator. Given the time series is of length T = 2 J , the total number of (decimated) wavelet coefficients at each level j is given by N j = T2 j . Then, the B j = ⌈(2m − 2)(1 − 2 j )⌉ boundary coefficients are given by the outer coefficients, where m is the number of vanishing moments of the wavelet. If p is the order of the polynomial trend, then we can zero out this trend so long as p < m. The number of non-boundary coefficients is denoted by M j = N j − B j . Denote by diag(x) the diagonal matrix with x along the main diagonal, and let 0 n and 1 n denote a vector of n zeros or ones respectively. Let A = diag(1 B 1 , 0 M 1 , 1 B 2 , 0 M 2 , … , 1 B J , 0 M J , 1 N J ), and let I T denote the T × T identity matrix. Using Daubechies wavelets ensures the DWT is orthogonal, and so we can partition the data vector x into a trend component and noise component, as Analogously to Proposition 1 of Craigmile et al. (2004), the wavelet-based trend estimator is unbiased, and further we can show that the estimator is mean square consistent, in the case of both polynomial and Hölder continuous trend functions.
Proposition 4.1. For a polynomial trend of degree p, and using a discrete wavelet with m > p vanishing moments, E(̂) = .
Proposition 4.2. Let j 0 = log 2 T, for some ∈ (0, 1), be the coarsest scale of the decomposition using discrete wavelet . Suppose that p < v( ), and assume that S j (z) ≤ D2 j for all z ∈ (0, 1) and some constant D. Ignoring boundary effects, the trend estimator is consistent in the mean square sense, that is:

Proposition 4.3.
Suppose is Hölder continuous with exponent . Then, under the same conditions on and S j as Proposition 4.2, the following holds: Choice of the parameter can be viewed as akin to selecting the bandwidth in a kernel-based estimation procedure. Larger values of loosely correspond to a larger bandwidth, and ensure more wavelet coefficients are thresholded, giving a smoother fit. For a Lipschitz continuous trend, balancing the rates in Proposition 4.3 leads to taking = 2∕3. We use this for both trend and second-order estimation in practice for simplicity. Under the conditions of Proposition 4.2 and subject to some mild regularity conditions, the trend estimate is multi-variate Gaussian, with mean and covariance RCov( )R T ∶= RΣR T , where R = W T AW, following from asymptotic normality of the wavelet coefficients (Stevens, 2013, Theorem3.1.1). Full discussion of the appropriate assumptions can be found in Stevens (2013). LSW processes with random innovations from the exponential, gamma, inverse-Gaussian, and F-distribution families all satisfy these assumptions.

MCGONIGLE, KILLICK, AND NUNES trend estimate is given bŷ(
In practice we do not know the true variance. Instead, we have access to a consistent estimate of it on the non-boundary values of the trend estimate, because we have a consistent estimate of the local autocovariance of the time series. Thus, we can construct confidence intervals for the trend using this estimate in a rather straightforward manner. This is in contrast to nonlinear wavelet thresholding techniques, for which construction of confidence intervals is far from straightforward. In practice, we may need to regularise the covariance matrix to ensure positive definiteness: this can be achieved straightforwardly using the method of Rothman (2012), for example.

Simultaneous Confidence Intervals
We can also compute simultaneous 100(1 − a)% confidence intervals for the trend estimate. Denote the vector of standard errors of the trend estimate by v, that is, Recall that the critical value is calculated as the value that satisfies using the symmetry and continuity properties of the multi-variate normal distribution. Therefore, is calculated to satisfy Assuming that the true spectral structure is known (in practice we have an asymptotically unbiased, consistent estimate), then −̂∼  (0, RΣR T ).
Hence we calculate as the value such that the probability of this multi-variate normal random variable exceeding v is equal to a∕2. Thus, within our framework, we can straightforwardly calculate an estimate and confidence intervals for the trend. In turn this could be used for hypothesis testing of the presence of trend. Although we have assumed a low-order polynomial for the trend, the simulation results demonstrate that even if we depart from this assumption, the trend estimator is still able to perform well. This is due to the fact that a polynomial can serve as an accurate approximation to a more complicated trend function, and the ability of smooth wavelets to zero out the trend, as explained in Section 3.3.

Boundary Handling
Using the results from Section 3, we can obtain an unbiased, consistent spectral estimator for the non-boundary coefficients when the time series exhibits a smooth trend. However, this will not be true for the boundary coefficients as they will still contain the trend of the time series. This behaviour is not bias but simply a consequence of the way the boundary coefficients are handled when performing the non-decimated wavelet transform. The two most common methods for boundary handling are to reflect the data at the boundaries or to periodise it -see section  Nason (2008) or section 4.11 of Percival and Walden (2006) for more discussion. These two methods can be used effectively in a zero mean time series, but are less suitable when the time series displays a trend. We can adapt the reflective boundary handling to produce better estimates of the EWS in the coarser scales. We use reflection of the time series as this ensures that the spectral properties of the time series in the reflected part evolve slowly as we move from the end of the time series past into the reflected, extended part. This is more appealing than periodising the time series, as spectral properties at one boundary of the series may not be as similar to those at the other boundary, due to the non-stationarity of the time series. Starting with a time series of length T, we form a time series of length 4T, whereby the original data are centred within the extended time series. That is, we add 3T∕2 data points to the beginning, and end, of the original series.
We note that in most applications, the standard practice of boundary handling involves repeating the series half of the length in each direction, giving a series of length 2T. However, we have found in our empirical analysis that using a series of length 4T offers stronger practical performance. One of the reasons for this is that in the coarser scales of the wavelet transform, the wavelet filter is often already longer than the length of the time series. Hence, in these coarse scales, the wavelet coefficients are formed only using boundary handling, which causes issues due to the trend of the time series. Therefore, extending the time series to length 4T ensures that a larger number of wavelet coefficients are not affected by the trend. This enables more accurate estimation in the coarser scales for the evolutionary wavelet spectrum.
We illustrate the boundary handling procedure for the right-hand boundary of the data with Figure 1. An analogous procedure is used to add data to the beginning of the series, and so here we focus on an illustration of adding data to the end of the series. In (a), we see a polynomial trend of length T = 1024 (in the example, we use a deterministic series for illustrative purposes). We wish to extend the data at the right boundary in such a way that we continuously extend the trend to minimise the boundary effect caused when calculating the boundary non-decimated wavelet coefficients. To achieve this, we first (b) reflect the entire time series at the boundary, given by the dashed vertical line, motivated by the reasoning above, obtaining a series of length 2T = 2048. Next, we multiply the boundary series (data points 1025-2048) by −1, to ensure that the gradient of the reflected trend part is similar to that at the edge of the time series (c). Finally in (d), we must add some value, c, to this reflected part, to align the edge of the time series with the boundary part. The value c used to align the boundary part with the original series is calculated using a simple pre-estimate of the trend function, obtained via a polynomial regression. From this we obtain a pre-estimate for the value of the trend at the boundary. The value c is given by two times this value, as this reverses the effect caused by multiplying the boundary series by −1, realigning the series.
Having performed this step, we can repeat this again to add another T∕2 = 512 data points on the right side, giving a series of length 5T∕2 = 2560. We then repeat the whole procedure for the left side boundary, adding 3T∕2 = 1536 data points to the start of the series, giving the final series of length 4T = 4096. In practice, we can apply the procedure when performing trend estimation, to produce a more reliable estimate at the boundaries of the time series. In this case, we no longer have strictly valid confidence intervals at the boundary locations. However, we have found that performing the boundary handling step significantly improves estimation at the boundaries by reducing the variability of the estimate.
Finally, we observe that this procedure can be easily adapted to time series whose length is non-dyadic. A time series of general length n is extended to one of dyadic length d, where d = 4 × 2 ⌊log 2 n⌋ . Indeed, the data example used in Section 6 is non-dyadic.

SIMULATION STUDY
To illustrate the ability to incorporate a smooth trend within the LSW framework, we perform a simulation study. In the simulation, we define three different evolutionary wavelet spectra, shown in Figure 2 and explicitly defined in the Supporting Information. Spectrum S 1 , studied in Nason (2008), displays coarse-scale, slowly-evolving sinusoidal behaviour with a fine-scale burst in power at time point 800; spectrum S 2 is a concatenation of moving average processes and contains power moving from fine to coarser scales, and was examined in Nason et al. (2000); spectrum S 3 contains slowly-evolving power at fine scales. We simulate time series {X t } T−1 t=0 of length T = 2 10 = 1024 from LSW processes with those spectra, using Gaussian innovations. Depending on the simulation, different wavelets were used, to highlight the observations from Section 3. The simulations were performed in R (R Core Team (2019)) using the wavethresh package of Nason (2016). For each spectrum, 100 LSW processes with different trends added on were simulated.
We examine the effects of including a trend across three scenarios, corresponding to Sections 3.1, 3.2, and 3.3 of this article. In the first scenario, we add low-order trends to LSW processes simulated using a Daubechies EP wavelet with four vanishing moments, which corresponds to the setting of Section 3.1. In the second, contained within the Supporting Information, we add low-order trends to LSW processes simulated using Haar wavelets, which corresponds to Section 3.2. In the final set of simulations, we add non-polynomial trends to LSW processes simulated using the Daubechies EP4 wavelet, corresponding to Section 3.3.
To assess the spectral estimation performance, for each realisation, the un-smoothed estimate of the EWS was calculated, which was then used to obtain an averaged estimate for the EWS across the 100 realisations. In alignment with the discussion in Sections 3 and 4, we correct the EWS across the finest seven scales and use the boundary handling procedure, which ensures that the boundary effects of the trend are minimised. Furthermore, in the Supporting Information, we also provide numerical evidence to show that using a smoother wavelet for spectral estimation is beneficial.
We also assess the trend estimation performance by reporting the averaged mean squared error of the trend estimate across the 100 realisations. The trend estimate is calculated using the Daubechies LA4 wavelet, with Figure 2. Spectra used in the simulation study. Left: S 1 , sinusoid with 'burst'; centre: S 2 , concatenated moving average process; right: S 3 , slowly-evolving fine-scale power boundary handling applied. Our linear wavelet thresholding method is referred to as LSWT in the comparison tables. We compare our method to three other trend estimation methods. We first compare to the approach of von Sachs and MacGibbon (2000), which utilises a local MAD estimator in a wavelet thresholding procedure. No code is publicly available for this method, and so we have implemented the method utilising wavethresh and following the description of the computation of the threshold in section 2.5 of von Sachs and MacGibbon (2000). This method is referred to as VSMWT in the comparison tables. Secondly, we also compare to the spline-based method using the R function smooth.spline, where the smoothing parameter is chosen via cross validation. Lastly, we compare to a local polynomial regression estimator, calculated using the loess function in R. We use the default settings and a local quadratic polynomial for fitting the trend.

Low-Order Polynomial Trend, High-Order Generating Wavelet
We add various low-order polynomial trends to LSW processes generated by the three different spectra. In particular, we use three different trends 1 t , 2 t , and 3 t (where t = (t∕T)); polynomials of degree 1, 2, and 3 given by (5.1) A plot of example realisations from each trend and spectrum scenario is shown in Figure 3. We use the EP4 wavelet to analyse the series, the same wavelet that generates the LSW processes, which is a high enough order to remove the polynomial trends.
In each of the simulations, we compare the estimates of the EWS obtained when trends are added, to the estimate of the EWS obtained in the absence of any trend ('None' in Table I), obtained in the standard way using the ewspec command within wavethresh. The averaged estimates for each spectrum across the four trend scenarios are shown in the Supporting Information. Table I contains the mean squared error across the four trends and three spectra, which shows that the trend has negligible impact. In fact, for the third spectrum, the mean squared error is actually lower when a trend is added. From the simulation results, we see that there is no discernible difference in the estimation quality between the no trend case and the case when a trend is added.
In Table II,   each trend and spectrum combination. From the table, we see that our method performs well and gives a low mean squared error across the various scenarios. The method consistently outperforms the competing methods, with only one scenario in which it does not give the lowest mean squared error. In particular, we outperform the wavelet-based method of von Sachs and MacGibbon (2000), which is in part due to the lack of proper boundary handling. As we might expect, the local polynomial method also performs well. The spline-based method performs well for spectrum 3, which is likely due to the fact this process contains only weak autocorrelation.   The sinusoidal and logistic trends are Lipschitz continuous; however, the exponential trend is not, and further it is not Hölder continuous. Realisations from the simulated time series can be found in the Supporting Information. We use an EP10 wavelet to compute the spectral estimate, which is the smoothest available wavelet in this family within wavethresh and will therefore be the best at removing the effect of the trend. The Supporting Information contains plots of the averaged spectrum again showing that then trend has not biased the spectral estimate. In Table III we compare the mean squared error for the averaged spectral estimates, and find similar results to that in Section 5.1. The error when including a trend is higher for spectrum S 1 , while lower for the other two spectra. These results confirm the finding in Section 3.3, that including a smooth non-polynomial trend will cause little bias to the estimate of the EWS. They also serve to highlight that using a smoother wavelet can achieve a more accurate spectral estimate.
Finally, we assess the trend estimation performance. Example trend estimates for each of the three trend scenarios for spectrum S 3 are shown in the Supporting Information. In Table IV, we report the average mean squared error for the methods calculated over 100 realisations, with standard deviation given in brackets. Despite the fact that the trends are non-polynomial, the wavelet-based estimator still performs well. Our method is consistently  Figure 4. Global average monthly sea temperature data the best performing across all of the trend and spectrum scenarios. Overall, the simulation results show that the linear wavelet thresholding estimator can perform well in a variety of trend settings in the presence of locally stationary errors. In particular, we observe that the linear wavelet thresholding estimator outperforms the nonlinear thresholding method of von Sachs and MacGibbon (2000).

GLOBAL MEAN SEA TEMPERATURE DATA APPLICATION
This section highlights the ability to model time series with time-varying first-and second-order structure on the monthly global mean sea temperature dataset (GMST) version 4.6 from the Goddard Institute for Space Studies (GISS) Surface Temperature Analysis (GISTEMP, NASA, 2020). The data are available at http://data.giss.nasa. gov/gistemp, and are shown in Figure 4. The GMST time series exhibits periods of warming separated by a long pause from approximately the mid-1940s to the mid-1970s (Kellogg, 1993) and possibly a second, shorter one, since the late 1990s/early 2000s, although this is highly debated (Trenberth, 2015;Beaulieu and Killick, 2018). The series should be well-suited to our methodology, as it appears to contain both long-term trends and short-term variability that may be non-stationary.
In general, the series exhibits an upward trend; therefore, using a high-order wavelet should remove the trend from the wavelet coefficients and allow us to obtain an estimate for the spectrum. In addition, we might reasonably expect there to be strong dependency within the data: as with many environmental time series, this may be a result of certain physical phenomena. In Figure 5, we see the spectral estimate for the GMST time series. This is estimated using the Daubechies LA10 wavelet, while the raw wavelet periodogram is smoothed using a running mean smoother with bin width 120, corresponding to 10 years. Each level is scaled individually for clarity. Additionally, we have used the boundary handling procedure outlined to properly handle the endpoints of the series. Motivated by the discussion in Section 3.3, we only calculate the EWS estimate across the finest seven scales, which corresponds to 70% of the scales. The choice of the LA10 wavelet is motivated by the discussion in Section 1 of the Supporting Information that suggests the benefit of using smoother wavelets. A more systematic approach to wavelet choice could be devised by adapting the work of Cardinali and Nason (2017). This method uses a penalised likelihood-based approach to select, for a given wavelet, an appropriate set of wavelet packets to use in the analysis of an LSW packet process.
We see that the spectral behaviour is generally slowly evolving with peaks at various locations: for example, note the burst of power that occurs in scale −4 around 1910. We also see an appearance of power in scale −7 at approximately 1940 and scale −6 slightly later, corresponding to the proposed period of pause in warming.
Next, we investigate how the autocorrelation structure of the GMST time series evolves over time. A common assertion in the Climate literature is that the autocorrelation within the Global Mean Sea Temperature Data is constant across time. Here we use the estimate of the spectrum to examine the autocorrelation function at different periods in the series. In Figure 6, we see a plot of the autocorrelation function of the time series at six different time-points across 25-year intervals, starting at 1890 and ending at 2015. Commonly, the error structure of climate change data is modelled using a low-order AR process, for example, Beaulieu and Killick (2018) and Karl et al. (2000). In similar analyses on gridded datasets, Lenton et al. (2017) find an increase in variance and autocorrelation in global surface sea temperature data, while Boulton and Lenton (2015) find a 'reddening' -a gradual increase in autocorrelation -in North Pacific surface sea temperature data.
Our analysis suggests that overall, the low-order AR assumption may be appropriate at certain time points; however, the degree of autocorrelation changes over time, and the assumption of a stationary AR model may not be appropriate. From the plot, the non-stationary nature of the second-order structure can be seen. In 1890In , 1915In , 1965In , 1990, and 2015, the autocorrelation exhibits a similar decay behaviour, falling within the upper and lower confidence lines after approximately lag 60, corresponding to a time period of 5 years. However, at time 1940, given by the dark solid line, we see much stronger autocorrelation, exhibiting a shape that is not well represented by a low-order AR model. This increase in autocorrelation can be attributed to the appearance of power at the coarsest scale of the spectrum, as seen in Figure 5. Moreover, our analysis shows that in the period between 2005 and 2020 -which is debated to be another warming pause -the variance of the series increases by roughly 63% from 0.0153 to 0.0240. This may explain the appearance of a pause due to increased variability, as many methods assume the series to be second-order stationary. Note that the trend estimate does suggest the appearance of a pause in this time period; however, because this occurs at the boundary, the estimate should be used with caution. Lastly, in Figure 7, we see the estimate for the trend shown in the solid line, with the 95% pointwise confidence interval shown in the dashed lines. This was calculated with the Daubechies EP10 wavelet, using the methodology described in Section 4, using the boundary extension procedure. Note that the estimate near the endpoints is therefore dependent on this boundary handling and inference should be carefully considered. The EP10 wavelet was selected using advice from a domain expert, by comparing several choices and their resulting fitted trends. Overall, our estimate agrees with many previous analyses of the series: we see a long pause between approximately the 1940s and 1970s, as well as a sustained period of warming post-1970 to the early 2000s.

CONCLUDING REMARKS
In this article, we have addressed the commonly encountered problem of modelling time series that display firstand second-order non-stationarity. Our model extends the locally stationary wavelet model of Nason et al. (2000), used for modelling second-order non-stationarity, to include a non-stationary, smooth mean structure. Within our framework, we can model both smooth deterministic trends and slowly-evolving second-order behaviour, allowing for a flexible model that can be applicable to time series in many applications.
We have established results concerning the theoretical properties of the estimators of the first-and second-order quantities of interest, which are shown to have good consistency properties and strong practical performance. Furthermore, we have found using a smoother wavelet to be useful even in the case of a zero-mean time series. To ensure the applicability of our methodology, we propose a new method for handling the boundary of a time series when computing the discrete non-decimated wavelet transform, which incorporates data of any length and is suitable for data that display trend behaviour. In our framework, the results and methodology are suitable for polynomial and smooth trends; however, a polynomial or smooth function is a good approximation to many trends of practical interest. In practice, the techniques described here will work well when we depart from the assumptions of the model, as evidenced within the simulation study.
We demonstrated our methodology on a data example that shows that our model can be successfully applied to a variety of data sets that exhibit time-varying first-and second-order structure. In our analysis of the global mean sea temperature dataset, we find that the second-order structure of the series varies over time, while observing similar findings as previously described of a general increasing trend of global mean sea temperature. ACKNOWLEDGEMENT Euan T. McGonigle gratefully acknowledges financial support from EPSRC and Numerical Algorithms Group Ltd. via The Smith Institute i-CASE award No. EP/R511997/1.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are publicly available. The global mean sea temperature data set is available to download at http://data.giss.nasa.gov/gistemp.

SUPPORTING INFORMATION
Additional Supporting Information may be found online in the supporting information tab for this article.