Tutorial on Statistical Computing for Extremes with R

Regression models for nonstationary extremes

Léo Belzile

2023-06-30

Nonstationarity

There are various forms of nonstationarity, including

trends,
time-varying variance,
seasonality,
covariate effects.

No general theory

Solution 1: preprocessing

detrend data to approximate stationary.
- for financial time series, fit ARMA-GARCH model
- use Box-Cox transformation with regression model
analyze residuals \(R_i = Y_i - \widehat{f}(\mathbf{X}_i)\), where the postulated model is such that \(\mathsf{E}(Y_i) = f(\mathbf{X}_i)\).

Benefit: more data are available to estimate trends.

Estimating probability of rare events

\[\begin{align*} &\Pr(Y_i > y \mid \mathbf{X}_i) \\\quad &= \Pr\{R_i > y - \widehat{f}(\mathbf{X}_i)\mid \mathbf{X}_i\} \\\quad & = \Pr\{R_i > y - \widehat{f}(\mathbf{X}_i)\mid \mathbf{X}_i, R_i > u\}\Pr(R_i > u) \\& \qquad + \Pr\{R_i > y - \widehat{f}(\mathbf{X}_i)\mid \mathbf{X}_i, R_i\leq u\}\Pr(R_i \leq u) \end{align*}\] where the first term of the last line can be estimated using a generalized Pareto and the latter empirically.

Solution 2: hierarchical models

We can also incorporate parameters in the parameters of the extreme value distribution.

Model \(g_{\mu}(\mu; \mathbf{X})=f_{\mu}(\mathbf{X}; \boldsymbol{\beta}_{\mu})\), \(g_{\sigma}(\sigma; \mathbf{X})=f_{\sigma}(\mathbf{X}; \boldsymbol{\beta}_{\sigma})\)
Link functions \(g\), functions of covariates \(f\)
Typically keep shape constant.

For explanatories \(\mathbf{X}\), fit regression models of the form \[\begin{align*} f(\mathbf{X}) = \beta_0 + \beta_1 \mathrm{X}_1 + \cdots \beta_p \mathrm{X}_p, \end{align*}\] and estimate parameters by maximum likelihood.

Generalized additive models

Rather than linear effects or changepoints (if \(\mathrm{X}_j\) is binary), we could consider smooths

\[\begin{align*} f(\mathrm{X}_j) = \sum_{k=1}^K \beta_k b_k(\mathrm{X}_j) \end{align*}\] where \(b_k\) are (compactly-supported) basis functions.

Scale input first! More numerically stable.

Parsimony is key

There are few exceedances or extremes…
The support restriction translates into up to \(n\) inequality constraints: difficult to optimize.

Take home message: keep models simple!

Penalized regression

Penalization to enforce sparsity or regularize estimation.

For generalized additive models (GAM), penalize square of second derivative to control the wiggliness of the function.
Smooth matrix \(\mathbf{S}\) with entries \(S_{ij} = \int b_i''(x) b_j''(x)\mathrm{d} x\), so the penalty is \(\lambda \boldsymbol{\beta}^\top \mathbf{S}\boldsymbol{\beta}\) for \(\lambda > 0\).
These correspond to improper (i.e., rank-deficient) Gaussian priors.

Tuning parameter

Maximization over \[\begin{align*} \max_{\lambda \in \mathbb{R}_K^{+}} \max_{\boldsymbol{\beta}} \left\{\ell_{\boldsymbol{\lambda}}(\boldsymbol{\beta}) - \frac{1}{2} \boldsymbol{\beta}^\top\mathbf{S}_{\boldsymbol{\lambda}}\boldsymbol{\beta}\right\} \end{align*}\] where \(\mathbf{S}_{\boldsymbol{\lambda}} = \sum_{k=1}^K \lambda_k \mathbf{S}_k\).

Optimal value of \(\boldsymbol{\lambda}\) obtained by maximizing the marginal likelihood, integrating out \(\boldsymbol{\beta}\) using Laplace’s method (Wood et al., 2016).

A word of caution

With time-varying covariates, we need forecasts to compute risk measures in future.
Model often fitted only at specific covariate values (careful at extrapolation beyond range).
Rather than linear trend in time, use covariates that incorporate said trends but are more natural (e.g., climate model output).

Return levels under nonstationarity

In nonstationary models, risk measures of interest are defined conditionally on the value of covariates

For example, the \(1-p\) conditional return level is (Eastoe & Tawn, 2009) \[\begin{align*} \Pr(Y_t > y \mid \mathbf{X}_t =\boldsymbol{x}_t) = p \end{align*}\]

Unconditional return levels

These can be obtained by averaging out over the distribution of covariates that are employed in the model.

\[\begin{align*} \Pr(Y_t > y)=\int_{\mathcal{X}} \Pr(Y_t > y \mid \mathbf{X}_t =\boldsymbol{x}_t) \mathrm{d} P(\boldsymbol{x}_t)=p. \end{align*}\]

May require information about future distribution of covariates if these are time-varying.

As such, return levels may be meaningless quantities.

Threshold stability

The generalized Pareto model with varying scale and shape is not threshold-stable unless, for any \(v>u\), \[\begin{align*} \sigma_v(\boldsymbol{x}_t) = \sigma_u(\boldsymbol{x}_t) + (v-u) \xi(\boldsymbol{x}_t) \end{align*}\]

Restrictive! For constant \(\xi\), need \(\sigma\) linear or constant (log scale) (Eastoe & Tawn, 2009).

Using the inhomogeneous Poisson point process representation avoids these problems.

For nonstationary threshold models \(u(\mathbf{X})\), see Section 3.2.2 of Northrop & Jonathan (2011).

Software

Function evgam from the eponymous package (Youngman, 2022) builds on the mgcv package (Wood, 2017).

The setup is evgam(formula, data, family, ...), where

family is the character string for the extreme value distribution (gev, gpd, rlarg and ald for asymmetric Laplace, used in quantile regression),
formula is a list of formula for parameters (in the order location, scale, shape).

Formulas and smooth

Use s for smooths and te for tensor products of smooths (interactions)

k controls the number of breakpoints
bs controls the basis function, e.g., thin-plate splines (tp), cubic regression splines (cr), cyclic cubic spline (cc).
fx: control whether fixed degrees of freedom for regression spline or else penalized regression spline (default, FALSE)

Data setup

library(evgam)
library(lubridate)
data(frwind, package = "mev")
lyon <- with(frwind,
             xts::xts(x = S2, order.by = date))
ymax <- xts::apply.yearly(lyon, max)
ymax <- ymax[-length(ymax)] # Remove 2023

Example with `evgam`

opt_gev <- evgam::evgam(
  data = data.frame(
    year = year(ymax),
    ymax = ymax),
  formula = list(ymax ~ s(year, bs = "cr"),
                 ~ 1, 
                 ~ 1),
  family = "gev")

`evgam` output

## Summary with coefficients
summary(opt_gev)


** Parametric terms **

location
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    36.55       0.67   54.63   <2e-16

logscale
            Estimate Std. Error t value Pr(>|t|)
(Intercept)     1.36       0.12   10.96   <2e-16

shape
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    -0.12       0.14   -0.88     0.19

** Smooth terms **

location
         edf max.df Chi.sq Pr(>|t|)
s(year) 2.34      9   5.22    0.101

Fitted splines

plot(opt_gev)

Figure 1: Fitted spline for location parameter of generalized extreme value distribution as a function of day of year (scaled to unit interval).

Plot and generalized analysis of deviance suggests constant location.

Estimated quantiles

For given \(p\), solve nonlinear equation \[\prod_{i=1}^n F^N(z; \boldsymbol{\theta}_i, \mathbf{X}_i) = p\] using qev to give unconditional quantile.

Return levels (direct method)

# More straightforwardly
mean(simulate(opt_gev, 
              nsim = 1000L, 
              type = "quantile", 
              probs = 1-1/50))

[1] 49.14449

These return levels are meaningless, since time increases (but would lead to extrapolate beyond range of observed time)…

Quantile regression

We can work the asymmetric Laplace distribution to build a working likelihood for quantile regression at probability level \(\tau\).

The latter has density function \[\begin{align*} f(y; \mu, \sigma) = \frac{\tau(1-\tau)}{\sigma} \exp\left\{-\rho_\tau \left(\frac{y-\mu}{\sigma}\right)\right\}, \end{align*}\] where \(\rho_\tau(y) = y(\tau-\mathrm{I}_{y <0})\) is the check function.

Time-varying threshold with quantile regression

qreg <- evgam::evgam(
  # Cubic cyclic splines
  formula = list(wind ~ s(tday, bs = "cc"), 
                 ~ s(tday, bs = "cc")), 
  data = data.frame(wind = as.numeric(lyon),
                    tday = yday(lyon)),
  family = "ald", 
  ald.args = list(tau = 0.95))

Temporal trend

plot(qreg)

Extract exceedances

u <- fitted(qreg)[,'location']
df <- data.frame(exc = as.numeric(lyon - u),
                  yday = yday(lyon),
                 u = u)
df <- df[lyon >u, ]

opt_gpd <- evgam::evgam(
  formula = list(exc ~ 1, ~1),
  data = df,
  family = "gpd")
post_sim <- simulate(opt_gpd, 
                     type = "response")

Return levels

# Number of exceedances per year
npy <- diff(range(date(lyon))) / nrow(df)
retlev_gpd <- qev(
  p = 1-1/50,
  m = as.numeric(npy),
  loc = df$u,
  scale = post_sim$scale,
  shape = post_sim$shape,
  family = "gpd")
mean(retlev_gpd)

[1] 49.0457

References

Eastoe, E. F., & Tawn, J. A. (2009). Modelling non-stationary extremes with application to surface level ozone. Journal of the Royal Statistical Society: Series C (Applied Statistics), 58(1), 25–45. https://doi.org/10.1111/j.1467-9876.2008.00638.x

Northrop, P. J., & Jonathan, P. (2011). Threshold modelling of spatially dependent non-stationary extremes with application to hurricane-induced wave heights. Environmetrics, 22(7), 799–809. https://doi.org/10.1002/env.1106

Wood, S. N. (2017). Generalized additive models: An introduction with R (2nd ed.). Chapman; Hall/CRC.

Wood, S. N., Pya, N., & Säfken, B. (2016). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association, 111(516), 1548–1563. https://doi.org/10.1080/01621459.2016.1180986

Youngman, B. D. (2022). evgam: An R package for generalized additive extreme value models. Journal of Statistical Software, 103(1), 1–26. https://doi.org/10.18637/jss.v103.i03