Solution 3

Exercise 3.1

Consider the Laplace family of distribution, \(\mathsf{La}(\nu, \tau)\), with density \[\begin{align*} g(x; \nu, \tau) = \frac{1}{2\tau} \exp\left(- \frac{|x-\nu|}{\tau}\right), \qquad \nu \in \mathbb{R}, \tau > 0 \end{align*}\] as a candidate distribution for rejection sampling from \(\mathsf{No}(0,1)\).

  1. Provide an inversion sampling algorithm to generate from \(\mathsf{La}(\nu, \tau)\).
  2. Can you use the proposal to generate from a standard Gaussian? for Student-\(t\) with 1 degree of freedom? Justify your answer.
  3. Consider as proposal a location-scale version of the Student-t with \(\nu=3\) degrees of freedom. Find the optimal location and scale parameters and the upper bound \(C\) for your choice.
  4. Use the accept-reject to simulate 1000 independent observations and compute the empirical acceptance rate.

Solution.

  1. The distribution function is \[\begin{align*} F(x) = \begin{cases} \frac{1}{2} \exp\left(\frac{x-\nu}{\tau}\right) & x \leq \nu\\ 1 - \frac{1}{2} \exp\left(-\frac{x-\nu}{\tau}\right) & x >\nu\\ \end{cases} \end{align*}\] and using the quantile transform, set \(X=\nu + \tau \log(2U)\) if \(U \leq 0.5\) and \(X=\nu - \tau\log(2-2U)\) if \(U > 0.5\) for \(U \sim \mathsf{U}(0,1)\).
  2. The Gaussian has lighter tail than the Laplace, so this won’t work. The Cauchy distribution would be a suitable candidate, albeit too heavy tailed.
  3. The optimal value for the location of the Student-\(t\) would be \(\nu\) (here zero for the standard Laplace). We compute the optimal scale via in the code below: \[ \mathrm{argmax}_{\sigma \in \mathbb{R}_{+}}\mathrm{argmin}_{x \in \mathbb{R}} \{\log f(x) - \log g(x; \sigma)\} \]
#' Laplace density
dlaplace <- function(x, loc = 0, scale = 1, log = FALSE){
 stopifnot(scale > 0)
 logdens <-  -log(2*scale) - abs(x-loc)/scale
 if(log){
 return(logdens)
 } else{
 return(exp(logdens))
 }
}
dstudent <- function(x, loc = 0, scale = 1, df = 1, log = FALSE){
  logdens <- -log(scale) + dt(x = (x - loc)/scale, df = df, log = TRUE)
   if(log){
    return(logdens)
   } else{
   return(exp(logdens))
   }
}
# For each value of the scale sigma,
# find the minimum value of x (typically at zero)
opt <- optimize(f = function(sigma){
  optimize(f = function(x){
  dlaplace(x, log = TRUE) - 
    dstudent(x, scale = sigma, df = 3, log = TRUE)}, 
  maximum = TRUE, 
  interval = c(-100, 100))$objective
}, interval = c(0.1,10))
(C <- exp(opt$objective))
[1] 1.261368
(sigma <- opt$minimum)
[1] 0.9272381
# Simulate from accept-reject
ntrials <- 1.1*C*1000
# Simulate from location-scale student
candidate <- sigma*rt(n = ntrials, df = 3)
# Compute log of acceptance rate
logR <- dlaplace(candidate, log = TRUE) - 
  dstudent(candidate, scale = sigma, df = 3, log = TRUE) 
samp <- candidate[logR >= log(C) + -rexp(ntrials)]
# Monte Carlo estimator of the acceptance rate
ntrials/length(samp)
[1] 1.237738
# Plot density
library(ggplot2)
ggplot(data = data.frame(x = samp),
       mapping = aes(x = x)) +
  geom_density() +
  theme_classic()

  1. The Monte Carlo acceptance rate is 1.24, compared with the analytical bound found via numerical optimization of 1.26, to two significant digits.

Exercise 3.2

We revisit Exercise 1.3, which used a half-Cauchy prior for the exponential waiting time of buses.

The ratio-of-uniform method, implemented in the rust R package, can be used to simulate independent draws from the posterior of the rate \(\lambda\). The following code produces

nobs <- 10L # number of observations
ybar <- 8   # average waiting time
B <- 1000L  # number of draws
# Un-normalized log posterior: scaled log likelihood + log prior
upost <- function(x){ 
  dgamma(x = x, shape = nobs + 1L, rate = nobs*ybar, log = TRUE) +
    log(2) + dt(x = x, df = 1, log = TRUE)}
post_samp <- rust::ru(logf = upost, 
                      n = B, 
                      d = 1,  # dimension of parameter (scalar)
                      init = nobs/ybar)$sim_vals # initial value of mode

Estimate using the Monte Carlo sample:

  1. the probability that the waiting time is between 3 and 15 minutes
  2. the average waiting time
  3. the standard deviation of the waiting time

Solution.

# Lambda is the reciprocal mean (1/minute)
times <- 1/post_samp
mean(times > 3 & times < 15)
[1] 0.983
mean(times)
[1] 8.019044
sd(times)
[1] 2.718529

Next, implement a random walk Metropolis–Hastings algorithm to sample draws from the posterior and re-estimate the quantities. Compare the values.

B <- 1e4L
curr <- 8/10 # prior mean
chains <- numeric(B) # container
sd_prop <- 0.1
for(b in seq_len(B)){
  prop <- rnorm(n = 1, mean = curr, sd = sd_prop)
  if(upost(prop) - upost(curr) > -rexp(1)){
    curr <- prop
  }
  chains[b] <- curr
}
# Discard burn-in
times <- 1/chains[-(1:100)]
mean(times > 3 & times < 15)
[1] 0.9759596
mean(times)
[1] 8.108757
sd(times)
[1] 2.66034
# Summary of MCMC
mcmc <- coda::mcmc(chains)
summary(mcmc)

Iterations = 1:10000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE Time-series SE 
     0.1355660      0.0421769      0.0004218      0.0010339 

2. Quantiles for each variable:

   2.5%     25%     50%     75%   97.5% 
0.06707 0.10729 0.13195 0.15895 0.22644 

Exercise 3.3

Repeat the simulations in Example 3.6, this time with a parametrization in terms of log rates \(\lambda_i\) \((i=1,2)\), with the same priors. Use a Metropolis–Hastings algorithm with a Gaussian random walk proposal, updating parameters one at a time. Run four chains in parallel.

  1. Tune the variance to reach an approximate acceptance rate of 0.44.
  2. Produce diagnostic plots (scatterplots of observations, marginal density plots, trace plots and correlograms). See bayesplot or coda. Comment on the convergence and mixing of your Markov chain Monte Carlo.
  3. Report summary statistics of the chains.

Solution. The only thing we need to do is change the parametrization, and add tuning of the variance.

data(upworthy_question, package = "hecbayes")
# Compute sufficient statistics
data <- upworthy_question |>
  dplyr::group_by(question) |>
  dplyr::summarize(ntot = sum(impressions),
                   y = sum(clicks))
# Code log posterior as sum of log likelihood and log prior
loglik <- function(par, counts = data$y, offset = data$ntot, ...){
  lambda <- exp(c(par[1] + log(offset[1]), par[2] + log(offset[2])))
 sum(dpois(x = counts, lambda = lambda, log = TRUE))
}
logprior <- function(par, ...){
  sum(dnorm(x = par, mean = log(0.01), sd = 1.5, log = TRUE))
}
logpost <- function(par, ...){
  loglik(par, ...) + logprior(par, ...)
}
# Compute maximum a posteriori (MAP)
map <- optim(
  par = rep(-4, 2),
  fn = logpost,
  control = list(fnscale = -1),
  offset = data$ntot,
  counts = data$y,
  hessian = TRUE)
# Use MAP as starting value
cur <- map$par
# Compute logpost_cur - we can keep track of this to reduce calculations
logpost_cur <- logpost(cur)
# Proposal covariance
cov_map <- -2*solve(map$hessian)
chol <- chol(cov_map)

set.seed(80601)
niter <- 1e4L
nchains <- 4L
npar <- 2L
chain <- array(0, dim = c(niter, nchains, npar))
naccept <- 0L
for(j in 1:nchains){
  for(i in seq_len(niter)){
    # Multivariate normal proposal - symmetric random walk
    prop <- chol %*% rnorm(n = 2)/1.05 + cur
    logpost_prop <- logpost(prop)
    # Compute acceptance ratio (no q because the ratio is 1)
    logR <- logpost_prop - logpost_cur
    if(logR > -rexp(1)){
      cur <- prop
      logpost_cur <- logpost_prop
      naccept <- naccept + 1L
    }
    chain[i,j,] <- cur
  }
}
# Acceptance rate
naccept/(nchains * niter)
[1] 0.44355
# Create some summary graphs
library(bayesplot)
# Name chains so that the graphs can be labelled
dimnames(chain)[[3]] <- c("lambda[1]","lambda[2]")

# Trace plots, correlograms, density plots and bivariate scatterplot
bayesplot::mcmc_trace(chain)

bayesplot::mcmc_acf_bar(chain)
Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
ℹ Please use the `rows` argument instead.
ℹ The deprecated feature was likely used in the bayesplot package.
  Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.

bayesplot::mcmc_dens(chain)

bayesplot::mcmc_scatter(chain)

# Posterior summaries with coda
# Need to create a list of mcmc objects...
mcmc_list <- coda::as.mcmc.list(
    lapply(seq_len(nchains),
           function(ch) coda::as.mcmc(chain[, ch, ])))
summary(mcmc_list)

Iterations = 1:10000
Thinning interval = 1 
Number of chains = 4 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

            Mean       SD  Naive SE Time-series SE
lambda[1] -4.513 0.001730 8.648e-06      2.384e-05
lambda[2] -4.442 0.001192 5.962e-06      1.670e-05

2. Quantiles for each variable:

            2.5%    25%    50%    75%  97.5%
lambda[1] -4.516 -4.514 -4.513 -4.511 -4.509
lambda[2] -4.444 -4.443 -4.442 -4.441 -4.440