Skip to main content

Applied Math

Time Series Foundations

Rigorous treatment of stationarity, the Wold decomposition, autocorrelation, unit roots, AR/MA/ARMA/ARIMA models, and spectral representation. The classical theory that every modern sequence model rests on.

AdvancedTier 2StableCore spine~70 min

Why This Matters

Time series carry temporal structure that i.i.d. methods cannot exploit and cannot ignore. A sample of stock returns, server latencies, ECG voltages, or climate temperatures is a single realization of a stochastic process indexed by time, not a random sample from a distribution. The dependence between XtX_t and its past sets the statistical properties of every estimator built on top.

The classical theory developed by Yule, Wold, Box, Jenkins, and Hamilton answers four questions. When does a process have stable statistics over time (stationarity)? When can it be represented as an infinite linear combination of past shocks (Wold)? Which finite-parameter family approximates the linear structure (ARMA)? And what does the second-order structure look like in frequency space (spectral density)? Everything that follows — from Kalman filtering to PatchTST — assumes or relaxes one of these answers.

Stationarity

Two notions of stationarity appear in the literature. Most theory uses the weaker (covariance) form because it is what estimators actually need.

Definition

Strict Stationarity

A process {Xt}tZ\{X_t\}_{t \in \mathbb{Z}} is strictly stationary if and only if for every k1k \geq 1, every t1<<tkt_1 < \cdots < t_k, and every shift hh,

(Xt1,,Xtk)=d(Xt1+h,,Xtk+h).(X_{t_1}, \ldots, X_{t_k}) \stackrel{d}{=} (X_{t_1+h}, \ldots, X_{t_k+h}).

The full joint distribution is shift-invariant.

Definition

Weak (Covariance) Stationarity

A process {Xt}\{X_t\} with E[Xt2]<\mathbb{E}[X_t^2] < \infty is weakly stationary (or covariance stationary) if:

  1. E[Xt]=μ\mathbb{E}[X_t] = \mu for all tt,
  2. Var(Xt)=σ2<\text{Var}(X_t) = \sigma^2 < \infty for all tt,
  3. γ(h):=Cov(Xt,Xt+h)\gamma(h) := \text{Cov}(X_t, X_{t+h}) depends only on hh, not on tt.

The function γ:ZR\gamma : \mathbb{Z} \to \mathbb{R} is the autocovariance function and is symmetric: γ(h)=γ(h)\gamma(-h) = \gamma(h).

Strict stationarity does not imply weak stationarity (a strictly stationary Cauchy process has no second moment). Weak stationarity does not imply strict stationarity. Under joint Gaussianity the two coincide because the joint distribution of a Gaussian vector is determined by its first two moments.

Watch Out

Stationarity is not the same as ergodicity

A stationary process can have time averages that fail to converge to ensemble averages. Ergodicity is the additional condition that 1nt=1nf(Xt)E[f(X0)]\frac{1}{n}\sum_{t=1}^n f(X_t) \to \mathbb{E}[f(X_0)] almost surely. For a Gaussian stationary process, ergodicity holds iff γ(h)0\gamma(h) \to 0 as hh \to \infty. Estimating μ\mu from one realization requires ergodicity, not just stationarity.

Autocorrelation

Definition

Autocorrelation Function

The autocorrelation function (ACF) of a covariance-stationary process is ρ(h)=γ(h)/γ(0)\rho(h) = \gamma(h) / \gamma(0), the autocovariance normalized by the variance. ρ(h)1|\rho(h)| \leq 1, ρ(0)=1\rho(0) = 1, ρ(h)=ρ(h)\rho(-h) = \rho(h).

Definition

Partial Autocorrelation Function

The partial autocorrelation at lag hh is the correlation between XtX_t and XthX_{t-h} after removing the linear effect of the intermediate lags Xt1,,Xth+1X_{t-1}, \ldots, X_{t-h+1}. Concretely, regress XtX_t on Xt1,,XthX_{t-1}, \ldots, X_{t-h}; the coefficient on XthX_{t-h} is α(h)\alpha(h).

The PACF is computed by the Yule-Walker equations. For an AR(pp) process, ρ(h)\rho(h) satisfies the linear system ρ(h)=ϕ1ρ(h1)++ϕpρ(hp)\rho(h) = \phi_1 \rho(h-1) + \cdots + \phi_p \rho(h-p) for h1h \geq 1, which in matrix form is Γϕ=γ\Gamma \boldsymbol{\phi} = \boldsymbol{\gamma} where Γij=γ(ij)\Gamma_{ij} = \gamma(|i-j|) is the Toeplitz autocovariance matrix.

ACF and PACF identify model order. For AR(pp), the PACF cuts off at lag pp and the ACF decays geometrically. For MA(qq), the ACF cuts off at lag qq and the PACF decays. The Box-Jenkins identification procedure reads pp and qq off the sample ACF and PACF plots.

AR, MA, ARMA, ARIMA

The lag operator LL acts on a sequence by LXt=Xt1L X_t = X_{t-1}. Polynomials in LL are the natural notation for linear time-series models.

Definition

AR(p), MA(q), ARMA(p,q)

Let {ϵt}\{\epsilon_t\} be white noise, ϵtWN(0,σ2)\epsilon_t \sim \text{WN}(0, \sigma^2), meaning uncorrelated with mean zero and constant variance.

  • AR(pp): Φ(L)Xt=ϵt\Phi(L) X_t = \epsilon_t where Φ(L)=1ϕ1LϕpLp\Phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p.
  • MA(qq): Xt=Θ(L)ϵtX_t = \Theta(L) \epsilon_t where Θ(L)=1+θ1L++θqLq\Theta(L) = 1 + \theta_1 L + \cdots + \theta_q L^q.
  • ARMA(p,qp, q): Φ(L)Xt=Θ(L)ϵt\Phi(L) X_t = \Theta(L) \epsilon_t.
Theorem

Stationarity Condition for AR(p)

Statement

The AR(pp) recursion Φ(L)Xt=ϵt\Phi(L) X_t = \epsilon_t admits a unique covariance-stationary solution iff every root of Φ(z)=1ϕ1zϕpzp\Phi(z) = 1 - \phi_1 z - \cdots - \phi_p z^p satisfies z>1|z| > 1. The solution is causal: Xt=Φ(L)1ϵt=j=0ψjϵtjX_t = \Phi(L)^{-1} \epsilon_t = \sum_{j=0}^\infty \psi_j \epsilon_{t-j} with coefficients ψj\psi_j that decay geometrically.

Intuition

Φ(L)1\Phi(L)^{-1} exists as a power series in LL exactly when Φ(z)0\Phi(z) \neq 0 on a neighborhood of the closed unit disk. The coefficients ψj\psi_j are the partial fractions of Φ(z)1\Phi(z)^{-1}; their decay rate equals the modulus of the smallest root.

Proof Sketch

Factor Φ(z)=i(1z/zi)\Phi(z) = \prod_i (1 - z/z_i) and expand each factor as a geometric series 1/(1z/zi)=j(z/zi)j1/(1 - z/z_i) = \sum_j (z/z_i)^j, valid for z<zi|z| < |z_i|. The product of these series gives Φ(z)1=jψjzj\Phi(z)^{-1} = \sum_j \psi_j z^j with ψjCρj|\psi_j| \leq C \rho^j where ρ=1/minizi<1\rho = 1/\min_i |z_i| < 1. Substituting LL for zz and applying to ϵt\epsilon_t gives a convergent series in L2L^2. Uniqueness follows because any other stationary solution has the same Wold representation.

Why It Matters

This is the test you actually run before fitting AR. For AR(1), the condition reduces to ϕ1<1|\phi_1| < 1. For AR(2) with parameters (ϕ1,ϕ2)(\phi_1, \phi_2), the stationarity region is the triangle ϕ2+ϕ1<1\phi_2 + \phi_1 < 1, ϕ2ϕ1<1\phi_2 - \phi_1 < 1, ϕ2<1|\phi_2| < 1. Outside this region, the recursion explodes or has a unit root.

Failure Mode

If Φ\Phi has a root on the unit circle (zi=1|z_i| = 1), the process has a unit root: shocks accumulate forever and Var(Xt)\text{Var}(X_t) grows linearly in tt. OLS fits to a unit-root series superconverge at rate nn instead of n\sqrt{n}, and t-statistics follow the Dickey-Fuller distribution rather than Student's t. Naive inference is invalid.

MA(qq) is always covariance stationary (a finite linear combination of finite-variance white noise has finite variance). The dual condition for MA(qq) is invertibility: every root of Θ(z)\Theta(z) outside the unit disk lets you write ϵt=Θ(L)1Xt\epsilon_t = \Theta(L)^{-1} X_t as an AR(\infty).

Definition

Differencing

The first-difference operator is =1L\nabla = 1 - L, so Xt=XtXt1\nabla X_t = X_t - X_{t-1}. The dd-fold difference is d=(1L)d\nabla^d = (1-L)^d.

Definition

ARIMA(p, d, q)

A process {Xt}\{X_t\} is ARIMA(p,d,qp, d, q) if and only if {dXt}\{\nabla^d X_t\} is ARMA(p,qp, q). The differencing order dd removes dd unit roots; the ARMA part models the stationary residual.

For seasonal data, SARIMA(p,d,qp,d,q)(P,D,Q)s(P,D,Q)_s adds a seasonal AR/MA polynomial in LsL^s and a seasonal difference s=1Ls\nabla_s = 1 - L^s.

Unit Roots and the ADF Test

A random walk Xt=Xt1+ϵtX_t = X_{t-1} + \epsilon_t is the prototype non-stationary series. Its variance grows linearly: Var(XtX0)=tσ2\text{Var}(X_t - X_0) = t \sigma^2. Unit-root testing decides whether to model a series in levels or in differences.

The Augmented Dickey-Fuller (ADF) test fits the regression Xt=α+βt+γXt1+j=1kδjXtj+ϵt\nabla X_t = \alpha + \beta t + \gamma X_{t-1} + \sum_{j=1}^k \delta_j \nabla X_{t-j} + \epsilon_t and tests H0:γ=0H_0: \gamma = 0 (unit root) against H1:γ<0H_1: \gamma < 0 (stationary). Under H0H_0, the t-statistic on γ^\hat\gamma does not follow a standard normal; it follows the Dickey-Fuller distribution, computed numerically and tabulated in Hamilton (1994), Chapter 17.

The KPSS test reverses the null: H0H_0 is stationarity (around a level or trend), H1H_1 is unit root. Combining ADF and KPSS provides robustness because each test has different size distortions under the other's null.

Watch Out

Differencing vs detrending

A unit-root series Xt=Xt1+ϵtX_t = X_{t-1} + \epsilon_t should be differenced. A trend-stationary series Xt=α+βt+utX_t = \alpha + \beta t + u_t with stationary utu_t should be detrended (subtract the fitted trend). Differencing a trend-stationary series leaves an MA(1) error with a unit root in Θ(L)\Theta(L), breaking invertibility. Detrending a true random walk leaves residuals that are still non-stationary. Test for unit roots first, then transform.

The Wold Decomposition

Every covariance-stationary process splits into a perfectly predictable deterministic part and a linear function of past shocks. This is the foundational result that justifies ARMA modeling.

Theorem

Wold Decomposition (Wold 1938)

Statement

Let {Xt}\{X_t\} be covariance stationary with E[Xt]=0\mathbb{E}[X_t] = 0. Then XtX_t admits the unique decomposition Xt=j=0ψjϵtj+VtX_t = \sum_{j=0}^\infty \psi_j \epsilon_{t-j} + V_t where:

  • ψ0=1\psi_0 = 1, j=0ψj2<\sum_{j=0}^\infty \psi_j^2 < \infty,
  • {ϵt}\{\epsilon_t\} is white noise: E[ϵt]=0\mathbb{E}[\epsilon_t] = 0, E[ϵt2]=σ2\mathbb{E}[\epsilon_t^2] = \sigma^2, E[ϵsϵt]=0\mathbb{E}[\epsilon_s \epsilon_t] = 0 for sts \neq t,
  • VtV_t is deterministic: Vtspan{Xs:s<t}V_t \in \overline{\text{span}}\{X_s : s < t\} in L2L^2, i.e. VtV_t is a perfect linear function of the infinite past,
  • E[ϵtVs]=0\mathbb{E}[\epsilon_t V_s] = 0 for all s,ts, t.

The decomposition is unique up to the choice of innovation variance.

Intuition

Project XtX_t onto the closed linear span of its own infinite past. Whatever is projected away — the residual — is a fresh shock orthogonal to the past, by construction. Iterating the projection backward produces the MA(\infty) representation. What remains in the projection itself is the deterministic part: a series whose value at time tt is exactly determined by past values.

Proof Sketch

Work in the Hilbert space H=L2(Ω,F,P)\mathcal{H} = L^2(\Omega, \mathcal{F}, P) with inner product X,Y=E[XY]\langle X, Y \rangle = \mathbb{E}[XY]. Define Ht=span{Xs:st}\mathcal{H}_t = \overline{\text{span}}\{X_s : s \leq t\}, the closed linear span of the past up to time tt.

Step 1 (innovations). The innovation ϵt=XtPHt1Xt\epsilon_t = X_t - P_{\mathcal{H}_{t-1}} X_t is the projection residual when predicting XtX_t from its strict past. By construction ϵtHt1\epsilon_t \perp \mathcal{H}_{t-1}, so E[ϵsϵt]=0\mathbb{E}[\epsilon_s \epsilon_t] = 0 for s<ts < t, and stationarity makes E[ϵt2]\mathbb{E}[\epsilon_t^2] constant. So {ϵt}\{\epsilon_t\} is white noise.

Step 2 (MA expansion). Set ψj=Xt,ϵtj/σ2\psi_j = \langle X_t, \epsilon_{t-j} \rangle / \sigma^2. Bessel's inequality gives j=0ψj2σ2E[Xt2]<\sum_{j=0}^\infty \psi_j^2 \sigma^2 \leq \mathbb{E}[X_t^2] < \infty. Define Ut=j=0ψjϵtjU_t = \sum_{j=0}^\infty \psi_j \epsilon_{t-j}, which converges in L2L^2. By construction UtU_t is the projection of XtX_t onto span{ϵs:st}\overline{\text{span}}\{\epsilon_s : s \leq t\}.

Step 3 (deterministic remainder). Let Vt=XtUtV_t = X_t - U_t. Then VtϵsV_t \perp \epsilon_s for all sts \leq t. Since XtHtX_t \in \mathcal{H}_t and ϵs\epsilon_s spans the innovation directions inside Ht\mathcal{H}_t, VtV_t lies in the orthogonal complement, which is sHs\bigcap_{s} \mathcal{H}_s. Elements of this tail-σ\sigma field are perfectly predictable from any earlier subspace; in particular VtHt1V_t \in \mathcal{H}_{t-1}, so VtV_t is deterministic.

Step 4 (uniqueness). If Xt=ψ~jϵ~tj+V~tX_t = \sum \tilde\psi_j \tilde\epsilon_{t-j} + \tilde V_t is another such decomposition, the orthogonality conditions force ϵ~t\tilde\epsilon_t to project to the same residual ϵt\epsilon_t, hence ψ~j=ψj\tilde\psi_j = \psi_j and V~t=Vt\tilde V_t = V_t.

Why It Matters

Wold says ARMA is not an arbitrary parametric family but the natural finite-parameter approximation to the universal MA(\infty) representation. Any covariance-stationary process can be approximated arbitrarily well in L2L^2 by ARMA models. Combined with d\nabla^d differencing for non-stationary inputs, this is the theoretical content of Box-Jenkins methodology.

Failure Mode

Wold is purely linear: it captures the second-order structure and nothing else. Nonlinear dependencies — volatility clustering (GARCH), regime switching (Hamilton 1989), threshold dynamics (TAR) — are invisible to the Wold representation. A series can be Wold-decomposable into white noise yet have strong predictive structure that ARMA cannot capture. The white-noise innovations are uncorrelated, not independent.

Spectral Representation

The autocovariance function and the spectral density are Fourier-pair descriptions of the same second-order structure. Spectral analysis is what you reach for when periodic or quasi-periodic structure dominates.

Definition

Spectral Density

For a covariance-stationary process with absolutely summable autocovariances (hγ(h)<\sum_h |\gamma(h)| < \infty), the spectral density is the Fourier transform of γ\gamma:

S(ω)=12πh=γ(h)eiωh,ω[π,π].S(\omega) = \frac{1}{2\pi} \sum_{h=-\infty}^\infty \gamma(h) e^{-i\omega h}, \qquad \omega \in [-\pi, \pi].

The inverse relation is γ(h)=ππeiωhS(ω)dω\gamma(h) = \int_{-\pi}^\pi e^{i\omega h} S(\omega)\, d\omega.

S(ω)0S(\omega) \geq 0 because it is the limit of expected periodograms; this is Bochner's theorem applied to γ\gamma. The variance decomposition γ(0)=ππS(ω)dω\gamma(0) = \int_{-\pi}^\pi S(\omega) d\omega shows that S(ω)S(\omega) allocates total variance across frequencies.

Theorem

Spectral Representation Theorem

Statement

Every mean-zero covariance-stationary process {Xt}\{X_t\} admits the representation Xt=ππeiωtdZ(ω)X_t = \int_{-\pi}^\pi e^{i\omega t}\, dZ(\omega) where Z(ω)Z(\omega) is a complex-valued process with orthogonal increments: E[dZ(ω)dZ(ω)]=0\mathbb{E}[dZ(\omega) \overline{dZ(\omega')}] = 0 for ωω\omega \neq \omega', and E[dZ(ω)2]=dF(ω)\mathbb{E}[|dZ(\omega)|^2] = dF(\omega) for a non-decreasing spectral distribution FF. When FF is absolutely continuous, dF/dω=S(ω)dF/d\omega = S(\omega).

Intuition

A stationary process is a continuous superposition of complex exponentials with random uncorrelated amplitudes. The amount of "energy" at frequency ω\omega is S(ω)dωS(\omega)\, d\omega. AR(2) processes with complex roots show pronounced peaks in S(ω)S(\omega) at the resonance frequency; white noise has flat S(ω)=σ2/(2π)S(\omega) = \sigma^2 / (2\pi).

Proof Sketch

The key tool is Bochner's theorem: a function γ:ZC\gamma : \mathbb{Z} \to \mathbb{C} is the autocovariance of some stationary process iff it is positive semidefinite, which by Bochner means γ(h)=eiωhdF(ω)\gamma(h) = \int e^{i\omega h} dF(\omega) for some non-decreasing FF. Then construct ZZ on a probability space by setting Z(ω)=tXtπωeiutdu/(2π)Z(\omega) = \sum_t X_t \cdot \int_{-\pi}^\omega e^{-i u t} du / (2\pi), verify orthogonal increments using stationarity, and check that the inverse Fourier transform recovers XtX_t.

Why It Matters

The spectral view gives closed-form expressions for ARMA spectra. For ARMA(p,qp, q), S(ω)=(σ2/2π)Θ(eiω)2/Φ(eiω)2S(\omega) = (\sigma^2 / 2\pi) |\Theta(e^{-i\omega})|^2 / |\Phi(e^{-i\omega})|^2. AR roots near the unit circle produce sharp peaks; MA roots near the unit circle produce sharp dips. This is the basis of frequency-domain estimation, Whittle likelihood, and bandpass filtering.

Failure Mode

The spectral density requires summability of γ\gamma. Long-memory processes (γ(h)h2d1\gamma(h) \sim h^{2d-1} for d(0,1/2)d \in (0, 1/2)) have S(ω)S(\omega) \to \infty as ω0\omega \to 0 at rate ω2d\omega^{-2d}. Standard ARMA spectral estimators are misspecified; fractional differencing (ARFIMA) is the right tool.

Worked Example: AR(1) Spectrum

For Xt=ϕXt1+ϵtX_t = \phi X_{t-1} + \epsilon_t with ϕ<1|\phi| < 1, the autocovariance is γ(h)=σ2ϕh/(1ϕ2)\gamma(h) = \sigma^2 \phi^{|h|} / (1 - \phi^2). The spectral density is S(ω)=σ22π11ϕeiω2=σ22π112ϕcosω+ϕ2.S(\omega) = \frac{\sigma^2}{2\pi} \cdot \frac{1}{|1 - \phi e^{-i\omega}|^2} = \frac{\sigma^2}{2\pi} \cdot \frac{1}{1 - 2\phi \cos\omega + \phi^2}. For ϕ>0\phi > 0, SS peaks at ω=0\omega = 0 (low-frequency dominance, slow drifts). For ϕ<0\phi < 0, SS peaks at ω=π\omega = \pi (high-frequency dominance, oscillation). The integral ππS(ω)dω=σ2/(1ϕ2)=γ(0)\int_{-\pi}^\pi S(\omega) d\omega = \sigma^2 / (1 - \phi^2) = \gamma(0) confirms variance conservation.

Common Confusions

Watch Out

White noise is not Gaussian

White noise is uncorrelated with zero mean and constant variance. It need not be Gaussian, independent, or even strictly stationary. A sequence of independent draws from a Cauchy distribution is uncorrelated only formally (no second moment), but bounded non-Gaussian white noise like ±1\pm 1 Bernoulli is fine. The AR/MA theory works for any white-noise innovation; Gaussianity is required only for finite-sample distributions of estimators.

Watch Out

ACF on a single realization is a noisy estimator

The sample ACF ρ^(h)=t=1nh(XtXˉ)(Xt+hXˉ)t=1n(XtXˉ)2\hat\rho(h) = \frac{\sum_{t=1}^{n-h}(X_t - \bar X)(X_{t+h} - \bar X)}{\sum_{t=1}^n (X_t - \bar X)^2} has standard error roughly 1/n1/\sqrt{n} for white noise, but it is biased toward zero in finite samples and the bias grows with hh. The Bartlett confidence bands ±1.96/n\pm 1.96/\sqrt{n} assume white noise. Plotting raw ACF/PACF without these bands invites overfitting.

Watch Out

Stationarity is not a property you can confirm; only fail

Tests for stationarity are tests of model assumptions, not of truth. ADF rejects unit root with low power on small samples. KPSS fails to reject stationarity on series with subtle structural breaks. Real series are never strictly stationary; the question is whether the deviation matters for the model you are fitting.

Summary

  • Two stationarity notions: strict (full distribution shift-invariant) and weak (mean, variance, autocovariance shift-invariant). Theory uses weak.
  • ACF and PACF identify ARMA model order via Box-Jenkins. AR cuts off in PACF; MA cuts off in ACF.
  • AR(pp) is stationary iff every root of Φ(z)\Phi(z) lies strictly outside the unit disk. Roots on the disk give unit roots, which require differencing.
  • Wold decomposition: every covariance-stationary process is an MA(\infty) plus a deterministic remainder. ARMA is the natural finite-parameter approximation.
  • Spectral density S(ω)S(\omega) is the Fourier transform of γ(h)\gamma(h). AR roots near the unit circle produce peaks; MA roots near the unit circle produce dips.

Exercises

ExerciseCore

Problem

Show that the autocovariance function of an MA(1) process Xt=ϵt+θϵt1X_t = \epsilon_t + \theta \epsilon_{t-1} is γ(0)=σ2(1+θ2)\gamma(0) = \sigma^2(1 + \theta^2), γ(±1)=σ2θ\gamma(\pm 1) = \sigma^2 \theta, γ(h)=0\gamma(h) = 0 for h2|h| \geq 2. What is ρ(1)\rho(1) in terms of θ\theta? For which θ\theta is the process invertible?

ExerciseAdvanced

Problem

Two MA(1) processes with parameters θ\theta and 1/θ1/\theta (and noise variances σ2\sigma^2 and θ2σ2\theta^2 \sigma^2 respectively) have the same autocovariance function. Verify this and explain why invertibility is an identifiability constraint, not a stationarity constraint.

ExerciseAdvanced

Problem

Show that the Wold innovations ϵt\epsilon_t in the decomposition Xt=jψjϵtj+VtX_t = \sum_j \psi_j \epsilon_{t-j} + V_t are uncorrelated but need not be independent. Construct a covariance-stationary process whose Wold innovations are dependent.

References

Canonical:

  • Box, G. E. P., Jenkins, G. M., Reinsel, G. C., Ljung, G. M. Time Series Analysis: Forecasting and Control, 5th ed., Wiley, 2015 (originally 1970), Chapters 3-5.
  • Hamilton, J. D. Time Series Analysis, Princeton University Press, 1994, Chapters 3, 4, 6, 17.
  • Brockwell, P. J., Davis, R. A. Time Series: Theory and Methods, 2nd ed., Springer, 1991, Chapters 3, 4, 5.
  • Wold, H. A Study in the Analysis of Stationary Time Series, Almqvist & Wiksell, Stockholm, 1938 (foundational; Wold decomposition).
  • Dickey, D. A., Fuller, W. A. "Distribution of the Estimators for Autoregressive Time Series with a Unit Root." Journal of the American Statistical Association, 74(366), 1979.

Current:

  • Shumway, R. H., Stoffer, D. S. Time Series Analysis and Its Applications: With R Examples, 4th ed., Springer, 2017, Chapters 1-4.
  • Hyndman, R. J., Athanasopoulos, G. Forecasting: Principles and Practice, 3rd ed., OTexts, 2021, Chapters 8-9.
  • Tsay, R. S. Analysis of Financial Time Series, 3rd ed., Wiley, 2010, Chapters 2-3.
  • Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., Shin, Y. "Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root." Journal of Econometrics, 54(1-3), 1992.

Next Topics

Last reviewed: May 6, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

3

Derived topics

2

Graph-backed continuations