9 Deterministic approximations
So far, we have focused on stochastic approximations of integral. In very large models, Markov chain Monte Carlo suffer from the curse of dimensionality and it is sometimes useful to resort to cheaper approximations. We begin this review by looking at the asymptotic Gaussian limiting distribution of the maximum aposteriori, the Laplace approximations for integrals (Tierney and Kadane 1986), and their applications for model comparison (Raftery 1995) and evaluation of the marginal likelihood. We also discuss integrated nested Laplace approximations (Rue, Martino, and Chopin 2009; Wood 2019), used in hierarchical models with Gaussian components to obtain approximations to the marginal distribution. This material also borrows from Section 8.2 and appendix C.2.2 of Held and Bové (2020).
We make use of Landau’s notation to describe the growth rate of some functions: we write
9.1 Laplace approximation and it’s applications
Proposition 9.1 (Laplace approximation for integrals) The Laplace approximation uses a Gaussian approximation to evaluate integrals of the form
The multivariate analog is similar, where now for an integral of the form
Laplace approximation uses a Taylor series approximation to approximate the density, but since the latter must be non-negative, it performs the approximation on the log scale and back-transform the result. It is important to understand that we can replace
Corollary 9.1 (Laplace approximation for marginal likelihood) Consider a simple random sample
If we use this approximation instead, we get
Example 9.1 (Bayesian model averaging approximation) Consider the diabetes
model from Park and Casella (2008). We fit various linear regression models, considering all best models of a certain type with at most the 10 predictors plus the intercept. In practice, we typically restrict attention to models within some distance of the lowest BIC value, as the weights otherwise will be negligible.
Most of the weight is on a handful of complex models, where the best fitting model only has around
Remark 9.1 (Parametrization for Laplace). Compare to sampling-based methods, the Laplace approximation requires optimization to find the maximum of the function. The Laplace approximation is not invariant to reparametrization: in practice, it is best to perform it on a scale where the likelihood is as close to quadratic as possible in
We can also use Laplace approximation to obtain a crude second-order approximation to the posterior. We suppose that the prior is proper.
We can Taylor expand the log prior and log density around their respective mode, say
In the case of flat prior, the curvature is zero and the prior contribution vanishes altogether. If we apply now Proposition 8.1 to this unnormalized kernel, we get that the approximate posterior must be Gaussian with precision
Theorem 9.1 (Bernstein-von Mises theorem) Consider any estimator asymptotically equivalent to the maximum likelihood estimator and suppose that the prior is continuous and positive in a neighborhood of the maximum. Assume further that the regularity conditions for maximum likelihood estimator holds. Then, in the limit as
The conclusions from this result is that, in large samples, the inference obtained from using likelihood-based inference and Bayesian methods will be equivalent: credible intervals will also have guaranteed frequentist coverage.
We can use the statement by replacing the maximum likelihood estimator and the observed information matrix with variants thereof (
Example 9.2 (Gaussian approximations to the posterior) To assess the performance of Laplace approximation, we consider an exponential likelihood
Let us now use Laplace approximation to obtain an estimate of the marginal likelihood: because the model is conjugate, the true log marginal likelihood equals
<- length(waiting); s <- sum(waiting)
n <- lgamma(n+a) - lgamma(a) + a*log(b) - (n+a) * log(b+s)
log_marg_lik # Laplace approximation
<- (n + a - 1)/(s + b)
map <- function(x){
logpost sum(dexp(waiting, rate = x, log = TRUE)) +
dgamma(x, a, b, log = TRUE)
}# Hessian evaluated at MAP
<- -numDeriv::hessian(logpost, x = map)
H <- 1/2*log(2*pi) - c(determinant(H)$modulus) + logpost(map) log_marg_laplace
For the sample of size
Proposition 9.2 (Posterior expectation using Laplace method) If we are interested in computing the posterior expectation of a positive real-valued functional
Example 9.3 (Posterior mean for the exponential likelihood) Consider the posterior mean
Simplification gives the approximation
9.2 Integrated nested Laplace approximation
In many high dimensional models, use of MCMC is prohibitively expensive and fast, yet accurate calculations are important. One class of models whose special structure is particularly amenable to deterministic approximations.
Consider a model with response
Write the marginal posterior
To obtain
- finding the mode of
using a Newton’s method, approximating the gradient and Hessian via finite differences. - Compute the negative Hessian at the mode to get an approximation to the covariance of
Use an eigendecomposition to get the principal directions . - In each direction of
, consider drops in as we move away from the mode and define a coarse grid based on these, keeping points where the difference in relative to the mode is less than some numerical tolerance - Retrieve the marginal by numerical integration using the central composition design outline above. We can also use directly avoid the integration and use the approximation at the posterior mode of
To approximate
While we could use the Laplace approximation
To reduce the computational costs, Rue, Martino, and Chopin (2009) propose to use the approximate mean to avoid optimizing and consider the conditional based on the conditional of the Gaussian approximation with mean
The INLA
R package provides an interface to fit models with Gaussian latent random effects. While the software is particularly popular for spatio-temporal applications using the SPDE approach, we revisit two examples in the sequel where we can exploit the Markov structure.
Example 9.4 (Stochastic volatility model with INLA) Financial returns INLA
documentation provides information about which default prior and hyperparameters are specified. We use a
library(INLA)
# Stochastic volatility model
data(exchangerate, package = "hecbayes")
# Compute response from raw spot exchange rates at noon
<- 100*diff(log(exchangerate$dexrate))
y # 'y' is now a series of percentage of log daily differences
<- seq_along(y)
time <- data.frame(y = y, time = time)
data # Stochastic volatility model
# https://inla.r-inla-download.org/r-inla.org/doc/likelihood/stochvolgaussian.pdf
# The model uses a log link, and a (log)-gamma prior for the precision
<- formula(y ~ f(time, model = "ar1", param = list(prec = c(1, 0.001))))
f_stochvol <- inla(f_stochvol, family = "stochvol", data = data)
mod_stochvol # Obtain summary
<- summary(mod_stochvol)
summary # plot(mod_stochvol)
<- mod_stochvol$marginals.hyperpar[[1]]
marg_prec <- mod_stochvol$marginals.hyperpar[[2]] marg_phi
Figure 9.3 shows that the correlation
We can use the marginal density approximations to obtain quantiles for summary of interest. The software also includes utilities to transform the parameters using the change of variable formula.
# Compute density, quantiles, etc. via inla.*marginal
## approximate 95% credible interval and marginal post median
::inla.qmarginal(marg_phi, p = c(0.025, 0.5, 0.975)) INLA
[1] 0.9706630 0.9847944 0.9929106
# Change of variable to get variance from precision
<- INLA::inla.tmarginal(
marg_var fun = function(x) { 1 / x },
marginal = marg_prec)
::inla.qmarginal(marg_var, p = c(0.025, 0.975)) INLA
[1] 0.2864908 0.7396801
# Posterior marginal mean and variance of phi
<- INLA::inla.emarginal(
mom1 fun = function(x){x},
marginal = marg_phi)
<- INLA::inla.emarginal(
mom2 fun = function(x){x^2},
marginal = marg_phi)
c(mean = mom1, sd = sqrt(mom2 - mom1^2))
mean sd
0.98405272 0.00576251
Example 9.5 (Tokyo binomial time series) We revisit Example 7.6, but this time fit the model with INLA
. We specify the mean model without intercept and fit a logistic regression, with a second-order cyclic random walk prior for the coefficients, and the default priors for the other parameters.
data(Tokyo, package = "INLA")
# Formula (removing intercept)
<- y ~ f(time, model = "rw2", cyclic = TRUE) - 1
formula <- INLA::inla(
mod formula = formula,
family = "binomial",
Ntrials = n,
data = Tokyo)
Figure 9.4 shows posterior summaries for the
If we wanted to obtain predictions, we need to augment the model matrix and set missing values for the response variable. These then get imputed alongside with the other parameters.