Consider the Laplace family of distribution, \(\mathsf{La}(\nu, \tau)\), with density \[\begin{align*}
g(x; \nu, \tau) = \frac{1}{2\tau} \exp\left(- \frac{|x-\nu|}{\tau}\right), \qquad \nu \in \mathbb{R}, \tau > 0
\end{align*}\] as a candidate distribution for rejection sampling from \(\mathsf{No}(0,1)\).
Provide an inversion sampling algorithm to generate from \(\mathsf{La}(\nu, \tau)\).
Can you use the proposal to generate from a standard Gaussian? for Student-\(t\) with 1 degree of freedom? Justify your answer.
Consider as proposal a location-scale version of the Student-t with \(\nu=3\) degrees of freedom. Find the optimal location and scale parameters and the upper bound \(C\) for your choice.
Use the accept-reject to simulate 1000 independent observations and compute the empirical acceptance rate.
Solution.
The distribution function is \[\begin{align*}
F(x) = \begin{cases}
\frac{1}{2} \exp\left(\frac{x-\nu}{\tau}\right) & x \leq \nu\\
1 - \frac{1}{2} \exp\left(-\frac{x-\nu}{\tau}\right) & x >\nu\\
\end{cases}
\end{align*}\] and using the quantile transform, set \(X=\nu + \tau \log(2U)\) if \(U \leq 0.5\) and \(X=\nu - \tau\log(2-2U)\) if \(U > 0.5\) for \(U \sim \mathsf{U}(0,1)\).
The Gaussian has lighter tail than the Laplace, so this won’t work. The Cauchy distribution would be a suitable candidate, albeit too heavy tailed.
The optimal value for the location of the Student-\(t\) would be \(\nu\) (here zero for the standard Laplace). We compute the optimal scale via in the code below: \[
\mathrm{argmax}_{\sigma \in \mathbb{R}_{+}}\mathrm{argmin}_{x \in \mathbb{R}} \{\log f(x) - \log g(x; \sigma)\}
\]
#' Laplace densitydlaplace <-function(x, loc =0, scale =1, log =FALSE){stopifnot(scale >0) logdens <--log(2*scale) -abs(x-loc)/scaleif(log){return(logdens) } else{return(exp(logdens)) }}dstudent <-function(x, loc =0, scale =1, df =1, log =FALSE){ logdens <--log(scale) +dt(x = (x - loc)/scale, df = df, log =TRUE)if(log){return(logdens) } else{return(exp(logdens)) }}# For each value of the scale sigma,# find the minimum value of x (typically at zero)opt <-optimize(f =function(sigma){optimize(f =function(x){dlaplace(x, log =TRUE) -dstudent(x, scale = sigma, df =3, log =TRUE)}, maximum =TRUE, interval =c(-100, 100))$objective}, interval =c(0.1,10))(C <-exp(opt$objective))
[1] 1.261368
(sigma <- opt$minimum)
[1] 0.9272381
# Simulate from accept-rejectntrials <-1.1*C*1000# Simulate from location-scale studentcandidate <- sigma*rt(n = ntrials, df =3)# Compute log of acceptance ratelogR <-dlaplace(candidate, log =TRUE) -dstudent(candidate, scale = sigma, df =3, log =TRUE) samp <- candidate[logR >=log(C) +-rexp(ntrials)]# Monte Carlo estimator of the acceptance ratentrials/length(samp)
The Monte Carlo acceptance rate is 1.24, compared with the analytical bound found via numerical optimization of 1.26, to two significant digits.
Exercise 3.2
We revisit Exercise 1.3, which used a half-Cauchy prior for the exponential waiting time of buses.
The ratio-of-uniform method, implemented in the rustR package, can be used to simulate independent draws from the posterior of the rate \(\lambda\). The following code produces
nobs <- 10L # number of observationsybar <-8# average waiting timeB <- 1000L # number of draws# Un-normalized log posterior: scaled log likelihood + log priorupost <-function(x){ dgamma(x = x, shape = nobs + 1L, rate = nobs*ybar, log =TRUE) +log(2) +dt(x = x, df =1, log =TRUE)}post_samp <- rust::ru(logf = upost, n = B, d =1, # dimension of parameter (scalar)init = nobs/ybar)$sim_vals # initial value of mode
Estimate using the Monte Carlo sample:
the probability that the waiting time is between 3 and 15 minutes
the average waiting time
the standard deviation of the waiting time
Solution.
# Lambda is the reciprocal mean (1/minute)times <-1/post_sampmean(times >3& times <15)
[1] 0.983
mean(times)
[1] 8.019044
sd(times)
[1] 2.718529
Next, implement a random walk Metropolis–Hastings algorithm to sample draws from the posterior and re-estimate the quantities. Compare the values.
# Summary of MCMCmcmc <- coda::mcmc(chains)summary(mcmc)
Iterations = 1:10000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
0.1355660 0.0421769 0.0004218 0.0010339
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
0.06707 0.10729 0.13195 0.15895 0.22644
Exercise 3.3
Repeat the simulations in Example 3.6, this time with a parametrization in terms of log rates \(\lambda_i\)\((i=1,2)\), with the same priors. Use a Metropolis–Hastings algorithm with a Gaussian random walk proposal, updating parameters one at a time. Run four chains in parallel.
Tune the variance to reach an approximate acceptance rate of 0.44.
Produce diagnostic plots (scatterplots of observations, marginal density plots, trace plots and correlograms). See bayesplot or coda. Comment on the convergence and mixing of your Markov chain Monte Carlo.
Report summary statistics of the chains.
Solution. The only thing we need to do is change the parametrization, and add tuning of the variance.
data(upworthy_question, package ="hecbayes")# Compute sufficient statisticsdata <- upworthy_question |> dplyr::group_by(question) |> dplyr::summarize(ntot =sum(impressions),y =sum(clicks))# Code log posterior as sum of log likelihood and log priorloglik <-function(par, counts = data$y, offset = data$ntot, ...){ lambda <-exp(c(par[1] +log(offset[1]), par[2] +log(offset[2])))sum(dpois(x = counts, lambda = lambda, log =TRUE))}logprior <-function(par, ...){sum(dnorm(x = par, mean =log(0.01), sd =1.5, log =TRUE))}logpost <-function(par, ...){loglik(par, ...) +logprior(par, ...)}# Compute maximum a posteriori (MAP)map <-optim(par =rep(-4, 2),fn = logpost,control =list(fnscale =-1),offset = data$ntot,counts = data$y,hessian =TRUE)# Use MAP as starting valuecur <- map$par# Compute logpost_cur - we can keep track of this to reduce calculationslogpost_cur <-logpost(cur)# Proposal covariancecov_map <--2*solve(map$hessian)chol <-chol(cov_map)set.seed(80601)niter <-1e4Lnchains <- 4Lnpar <- 2Lchain <-array(0, dim =c(niter, nchains, npar))naccept <- 0Lfor(j in1:nchains){for(i inseq_len(niter)){# Multivariate normal proposal - symmetric random walk prop <- chol %*%rnorm(n =2)/1.05+ cur logpost_prop <-logpost(prop)# Compute acceptance ratio (no q because the ratio is 1) logR <- logpost_prop - logpost_curif(logR >-rexp(1)){ cur <- prop logpost_cur <- logpost_prop naccept <- naccept + 1L } chain[i,j,] <- cur }}# Acceptance ratenaccept/(nchains * niter)
[1] 0.44355
# Create some summary graphslibrary(bayesplot)# Name chains so that the graphs can be labelleddimnames(chain)[[3]] <-c("lambda[1]","lambda[2]")# Trace plots, correlograms, density plots and bivariate scatterplotbayesplot::mcmc_trace(chain)
bayesplot::mcmc_acf_bar(chain)
Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
ℹ Please use the `rows` argument instead.
ℹ The deprecated feature was likely used in the bayesplot package.
Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.
bayesplot::mcmc_dens(chain)
bayesplot::mcmc_scatter(chain)
# Posterior summaries with coda# Need to create a list of mcmc objects...mcmc_list <- coda::as.mcmc.list(lapply(seq_len(nchains),function(ch) coda::as.mcmc(chain[, ch, ])))summary(mcmc_list)
Iterations = 1:10000
Thinning interval = 1
Number of chains = 4
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
lambda[1] -4.513 0.001730 8.648e-06 2.384e-05
lambda[2] -4.442 0.001192 5.962e-06 1.670e-05
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
lambda[1] -4.516 -4.514 -4.513 -4.511 -4.509
lambda[2] -4.444 -4.443 -4.442 -4.441 -4.440