Regression models for nonstationary extremes
2023-06-30
There are various forms of nonstationarity, including
No general theory
Benefit: more data are available to estimate trends.
\[\begin{align*} &\Pr(Y_i > y \mid \mathbf{X}_i) \\\quad &= \Pr\{R_i > y - \widehat{f}(\mathbf{X}_i)\mid \mathbf{X}_i\} \\\quad & = \Pr\{R_i > y - \widehat{f}(\mathbf{X}_i)\mid \mathbf{X}_i, R_i > u\}\Pr(R_i > u) \\& \qquad + \Pr\{R_i > y - \widehat{f}(\mathbf{X}_i)\mid \mathbf{X}_i, R_i\leq u\}\Pr(R_i \leq u) \end{align*}\] where the first term of the last line can be estimated using a generalized Pareto and the latter empirically.
We can also incorporate parameters in the parameters of the extreme value distribution.
For explanatories \(\mathbf{X}\), fit regression models of the form \[\begin{align*} f(\mathbf{X}) = \beta_0 + \beta_1 \mathrm{X}_1 + \cdots \beta_p \mathrm{X}_p, \end{align*}\] and estimate parameters by maximum likelihood.
Rather than linear effects or changepoints (if \(\mathrm{X}_j\) is binary), we could consider smooths
\[\begin{align*} f(\mathrm{X}_j) = \sum_{k=1}^K \beta_k b_k(\mathrm{X}_j) \end{align*}\] where \(b_k\) are (compactly-supported) basis functions.
Scale input first! More numerically stable.
Take home message: keep models simple!
Penalization to enforce sparsity or regularize estimation.
Maximization over \[\begin{align*} \max_{\lambda \in \mathbb{R}_K^{+}} \max_{\boldsymbol{\beta}} \left\{\ell_{\boldsymbol{\lambda}}(\boldsymbol{\beta}) - \frac{1}{2} \boldsymbol{\beta}^\top\mathbf{S}_{\boldsymbol{\lambda}}\boldsymbol{\beta}\right\} \end{align*}\] where \(\mathbf{S}_{\boldsymbol{\lambda}} = \sum_{k=1}^K \lambda_k \mathbf{S}_k\).
In nonstationary models, risk measures of interest are defined conditionally on the value of covariates
For example, the \(1-p\) conditional return level is (Eastoe & Tawn, 2009) \[\begin{align*} \Pr(Y_t > y \mid \mathbf{X}_t =\boldsymbol{x}_t) = p \end{align*}\]
These can be obtained by averaging out over the distribution of covariates that are employed in the model.
\[\begin{align*} \Pr(Y_t > y)=\int_{\mathcal{X}} \Pr(Y_t > y \mid \mathbf{X}_t =\boldsymbol{x}_t) \mathrm{d} P(\boldsymbol{x}_t)=p. \end{align*}\]
May require information about future distribution of covariates if these are time-varying.
As such, return levels may be meaningless quantities.
The generalized Pareto model with varying scale and shape is not threshold-stable unless, for any \(v>u\), \[\begin{align*} \sigma_v(\boldsymbol{x}_t) = \sigma_u(\boldsymbol{x}_t) + (v-u) \xi(\boldsymbol{x}_t) \end{align*}\]
Restrictive! For constant \(\xi\), need \(\sigma\) linear or constant (log scale) (Eastoe & Tawn, 2009).
Using the inhomogeneous Poisson point process representation avoids these problems.
For nonstationary threshold models \(u(\mathbf{X})\), see Section 3.2.2 of Northrop & Jonathan (2011).
Function evgam
from the eponymous package (Youngman, 2022) builds on the mgcv
package (Wood, 2017).
The setup is evgam(formula, data, family, ...)
, where
family
is the character string for the extreme value distribution (gev
, gpd
, rlarg
and ald
for asymmetric Laplace, used in quantile regression),formula
is a list of formula for parameters (in the order location, scale, shape).Use s
for smooths and te
for tensor products of smooths (interactions)
k
controls the number of breakpointsbs
controls the basis function, e.g., thin-plate splines (tp
), cubic regression splines (cr
), cyclic cubic spline (cc
).fx
: control whether fixed degrees of freedom for regression spline or else penalized regression spline (default, FALSE
)evgam
evgam
output
** Parametric terms **
location
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.55 0.67 54.63 <2e-16
logscale
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.36 0.12 10.96 <2e-16
shape
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.12 0.14 -0.88 0.19
** Smooth terms **
location
edf max.df Chi.sq Pr(>|t|)
s(year) 2.34 9 5.22 0.101
Plot and generalized analysis of deviance suggests constant location.
For given \(p\), solve nonlinear equation \[\prod_{i=1}^n F^N(z; \boldsymbol{\theta}_i, \mathbf{X}_i) = p\] using qev
to give unconditional quantile.
[1] 49.14449
These return levels are meaningless, since time increases (but would lead to extrapolate beyond range of observed time)…
We can work the asymmetric Laplace distribution to build a working likelihood for quantile regression at probability level \(\tau\).
The latter has density function \[\begin{align*} f(y; \mu, \sigma) = \frac{\tau(1-\tau)}{\sigma} \exp\left\{-\rho_\tau \left(\frac{y-\mu}{\sigma}\right)\right\}, \end{align*}\] where \(\rho_\tau(y) = y(\tau-\mathrm{I}_{y <0})\) is the check function.
evgam
: An R package for generalized additive extreme value models. Journal of Statistical Software, 103(1), 1–26. https://doi.org/10.18637/jss.v103.i03