Numerical optimization of the generalized Pareto distribution for
data exceeding threshold
.
This function returns an object of class mev_gpd
, with default methods for printing and quantile-quantile plots.
Usage
fit.gpd(
xdat,
threshold = 0,
method = "Grimshaw",
show = FALSE,
MCMC = NULL,
k = 4,
tol = 1e-08,
fpar = NULL,
warnSE = FALSE
)
Arguments
- xdat
a numeric vector of data to be fitted.
- threshold
the chosen threshold.
- method
the method to be used. See Details. Can be abbreviated.
- show
logical; if
TRUE
(the default), print details of the fit.- MCMC
NULL
for frequentist estimates, otherwise a boolean or a list with parameters passed. IfTRUE
, runs a Metropolis-Hastings sampler to get posterior mean estimates. Can be used to pass argumentsniter
,burnin
andthin
to the sampler as a list.- k
bound on the influence function (
method = "obre"
); the constantk
is a robustness parameter (higher bounds are more efficient, low bounds are more robust). Default to 4, must be larger than \(\sqrt{2}\).- tol
numerical tolerance for OBRE weights iterations (
method = "obre"
). Default to1e-8
.- fpar
a named list with fixed parameters, either
scale
orshape
- warnSE
logical; if
TRUE
, a warning is printed if the standard errors cannot be returned from the observed information matrix when the shape is less than -0.5.
Value
If method
is neither 'zs'
nor 'zhang'
, a list containing the following components:
estimate
a vector containing thescale
andshape
parameters (optimized and fixed).std.err
a vector containing the standard errors. Formethod = "obre"
, these are Huber's robust standard errors.vcov
the variance covariance matrix, obtained as the numerical inverse of the observed information matrix. Formethod = "obre"
, this is the sandwich Godambe matrix inverse.threshold
the threshold.method
the method used to fit the parameter. See details.nllh
the negative log-likelihood evaluated at the parameterestimate
.nat
number of points lying above the threshold.pat
proportion of points lying above the threshold.convergence
components taken from the list returned byoptim
. Values other than0
indicate that the algorithm likely did not converge (in particular 1 and 50).counts
components taken from the list returned byoptim
.exceedances
excess over the threshold.
Additionally, if method = "obre"
, a vector of OBRE weights
.
Otherwise, a list containing
threshold
the threshold.method
the method used to fit the parameter. See Details.nat
number of points lying above the threshold.pat
proportion of points lying above the threshold.approx.mean
a vector containing containing the approximate posterior mean estimates.
and in addition if MCMC is neither FALSE
, nor NULL
post.mean
a vector containing the posterior mean estimates.post.se
a vector containing the posterior standard error estimates.accept.rate
proportion of points lying above the threshold.niter
length of resulting Markov Chainburnin
amount of discarded iterations at start, capped at 10000.thin
thinning integer parameter describing
Details
The default method is 'Grimshaw'
, which maximizes the profile likelihood for the ratio scale/shape. Other options include 'obre'
for optimal \(B\)-robust estimator of the parameter of Dupuis (1998), vanilla maximization of the log-likelihood using constrained optimization routine 'auglag'
, 1-dimensional optimization of the profile likelihood using nlm
and optim
. Method 'ismev'
performs the two-dimensional optimization routine gpd.fit
from the ismev
library, with in addition the algebraic gradient.
The approximate Bayesian methods ('zs'
and 'zhang'
) are extracted respectively from Zhang and Stephens (2009) and Zhang (2010) and consists of a approximate posterior mean calculated via importance
sampling assuming a GPD prior is placed on the parameter of the profile likelihood.
Note
Some of the internal functions (which are hidden from the user) allow for modelling of the parameters using covariates. This is not currently implemented within gp.fit
, but users can call internal functions should they wish to use these features.
References
Davison, A.C. (1984). Modelling excesses over high thresholds, with an application, in Statistical extremes and applications, J. Tiago de Oliveira (editor), D. Reidel Publishing Co., 461--482.
Grimshaw, S.D. (1993). Computing Maximum Likelihood Estimates for the Generalized Pareto Distribution, Technometrics, 35(2), 185--191.
Northrop, P.J. and C. L. Coleman (2014). Improved threshold diagnostic plots for extreme value analyses, Extremes, 17(2), 289--303.
Zhang, J. (2010). Improving on estimation for the generalized Pareto distribution, Technometrics 52(3), 335--339.
Zhang, J. and M. A. Stephens (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51(3), 316--325.
Dupuis, D.J. (1998). Exceedances over High Thresholds: A Guide to Threshold Selection, Extremes, 1(3), 251--261.
Author
Scott D. Grimshaw for the Grimshaw
option. Paul J. Northrop and Claire L. Coleman for the methods optim
, nlm
and ismev
.
J. Zhang and Michael A. Stephens (2009) and Zhang (2010) for the zs
and zhang
approximate methods and L. Belzile for methods auglag
and obre
, the wrapper and MCMC samplers.
If show = TRUE
, the optimal \(B\) robust estimated weights for the largest observations are printed alongside with the
\(p\)-value of the latter, obtained from the empirical distribution of the weights. This diagnostic can be used to guide threshold selection:
small weights for the \(r\)-largest order statistics indicate that the robust fit is driven by the lower tail
and that the threshold should perhaps be increased.
Examples
data(eskrain)
fit.gpd(eskrain, threshold = 35, method = 'Grimshaw', show = TRUE)
#> Method: Grimshaw
#> Log-likelihood: -174.3083
#>
#> Threshold: 35
#> Number Above: 54
#> Proportion Above: 0.5806
#>
#> Estimates
#> scale shape
#> 9.18077 0.01082
#>
#> Standard Errors
#> scale shape
#> 1.6187 0.1121
#>
#> Optimization Information
#> Convergence: successful
fit.gpd(eskrain, threshold = 30, method = 'zs', show = TRUE)
#>
#> Method: Zhang and Stephens
#>
#> Threshold: 30
#> Number Above: 93
#> Proportion Above: 1
#>
#> Approximate posterior mean estimates
#> scale shape
#> 8.9444 0.0422