Threshold selection diagnostic of Suveges and Davison

The information matrix test (IMT), proposed by Suveges and Davison (2010), is based on the difference between the expected quadratic score and the second derivative of the log-likelihood. The asymptotic distribution for each threshold u and gap K is asymptotically \(\chi^2\) with one degree of freedom. The approximation is good for \(N>80\) and conservative for smaller sample sizes. The test assumes independence between gaps.

Usage

thselect.sdinfo(xdat, thresh, quantile, plot = FALSE, kmax = 1)

Arguments

xdat: [vector] vector of observations
thresh: [vector] candidate thresholds
quantile: [vector] probability levels to define threshold if thresh is missing.
plot: [logical]; should the graphical diagnostic be plotted?
kmax: [int] the largest K-gap under consideration for clusters

Value

an invisible list of class with elements

thresh a vector of thresholds based on empirical quantiles at supplied levels.
stat a matrix of test statistics
pval a matrix of approximate p-values (corresponding to probabilities under a \(\chi^2_1\) distribution)
mle a matrix of maximum likelihood estimates for each given pair of thresholds and gaps
loglik a matrix of log-likelihood values at MLE for each given pair of elements in thresh and gap in \(0, \ldots,\code{kmax}\)
quantile quantile levels for thresholds, if supplied by the user
kmax the largest gap number

Details

The procedure proposed in Suveges & Davison (2010) was corrected for erratas. The maximum likelihood is based on the limiting mixture distribution of the intervals between exceedances (an exponential with a point mass at zero). The condition \(D^{(K)}(u_n)\) should be checked by the user.

Fukutome et al. (2015) propose an ad hoc automated procedure

Calculate the interexceedance times for each K-gap and each threshold, along with the number of clusters
Select the (u, K) pairs for which IMT < 0.05 (corresponding to a P-value of 0.82)
Among those, select the pair (u, K) for which the number of clusters is the largest

References

Fukutome, Liniger and Suveges (2015), Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theoretical and Applied Climatology, 120(3), 403-416.

Suveges and Davison (2010), Model misspecification in peaks over threshold analysis. Annals of Applied Statistics, 4(1), 203-221.

White (1982), Maximum Likelihood Estimation of Misspecified Models. Econometrica, 50(1), 1-25.

Author

Leo Belzile

Examples

thselect.sdinfo(
  xdat = rgp(n = 10000),
  quantile = seq(0.1, 0.9, length = 10),
  kmax = 3)