data(climatechange, package = "hecbayes")
# Transform factor name to integer for group assignment
<- as.integer(climatechange$GCM)
GCM <- list(N = nrow(climatechange),
data K = 4L,
L = max(GCM), # number of GCM provides
y = climatechange$temp, # temperature series
# Treat RCP as a factor (Model
RCP = as.integer(factor(climatechange$RCP)),
GCM = GCM,
# TODO: add sensible prior values for mu
mu = sort(rexp(4)))
<- cmdstanr::cmdstan_model(stan_file = "climatechange.stan")
cc_model <- cc_model$sample(data = data, chains = 4L) cc_fit
Assignment 2
These problems are for credit and to be handed in on ZoneCours on March 16th at the latest at 23:30.
Problem 2.1
The rats
data, extracted from Table 5 of Tarone (1982), contains observations from 70 experiments on animal carcinogenesis bioassay conducted with female F344 rats, recording the number of lung cancer tumors developed. Treating the observations as independent, write \[\begin{align*}
y_i \mid \theta_i &\sim \mathsf{binom}(n_i, \theta_i), \quad i =1, \ldots, 70 \\
\theta_i \mid \alpha, \beta & \sim \mathsf{beta}(\alpha, \beta)
\end{align*}\] with an improper prior \(p(\alpha, \beta) = (\alpha + \beta)^{-5/2}\mathrm{I}(\alpha>0, \beta>0).\)
- Derive the conditional distributions of \(p(\theta_i \mid \alpha, \beta, \boldsymbol{y})\), \(p(\alpha \mid \beta, \boldsymbol{\theta})\) and \(p(\beta \mid \alpha, \boldsymbol{\theta}).\)
- Implement a Gibbs sampling algorithm simulating in turn from each of these conditionals. You can sample the \(\theta_i\) easily since the prior is conditionally conjugate. For the other two parameters, \(\alpha\) and \(\beta\), try using one of the following methods: rejection sampling (try an exponential distribution), ratio-of-uniform (Wakefield et al., 1991) via the R package
rust
, adaptive rejection sampling (Gilks & Wild, 1992; Martino et al., 2018) via thears
R package1 or slice sampling (Neal, 2003).2 - Use instead a sampling algorithm with a Gibbs step for \(\boldsymbol{\theta}\) and a random walk Metropolis–Hastings step for \(\alpha\) and \(\beta\).
- Reparametrize the model in terms of \(\log(\beta)\) and \(\log(\alpha)/\log(\beta)\) instead and run a Metropolis–Hasting step for the two parameters.
- For each sampler and parametrization, plot the bivariate histogram/density/scatter plot for \((\alpha, \beta)\), plus correlograms for both parameters. Comment on the mixing of the algorithms.
Problem 2.2
The purpose of this exercise is to explore different parametrizations and their impact on sampling. The climatechange
data contains mean global temperature increases \(T\) from the twenty year period 1970–1999 to projections 2069–2098 for different climate general circulation models (GCM), and sometimes multiple simulation runs for each GCM. The purpose of the analysis is to obtain a suitable mean temperature change globally for each representative concentration pathway (RCP, one of 2.6, 4.5, 6.0 and 8.5). The data were obtained from the bang
R package and preprocessed to simplify the exposition, but see Northrop & Chandler (2014) for pointers about the methodology.
The hierarchical structure for Model A and temperature \(T\) for RCP \(k\) and GCM \(l\) is: \[\begin{align*} T_{k,l}\mid \boldsymbol{\alpha}, \boldsymbol{\beta}, \tau &\sim \mathsf{Gauss}(\alpha_k + \beta_l, \sigma^2)\\ \boldsymbol{\alpha} \mid \boldsymbol{\mu}, \omega &\sim \mathsf{Gauss}_4(\boldsymbol{\mu}, \omega^2\mathbf{I}_4)\\ \beta_l \mid \sigma &\sim \mathsf{Gauss}(0, \tau^2), \qquad l = 1, \ldots, L. \end{align*}\]
- Prior elicitation:
- what are your prior belief about climate change the range of plausible temperature increases? Write down suitable values for the means \(\boldsymbol{\mu}\) and the standard deviation \(\omega\) (which could be the same for all RCP) under different emission scenarios, if \(\mu_1\) corresponds to RCP 2.6, \(\mu_2\) to RCP 4.5, etc. and your guesses are non-decreasing.
- choose suitable priors for the scale parameters \(\omega,\) \(\tau\) and \(\sigma\).
- Reparametrization: enforce a sum-to-zero hard constraint for \(\beta_l\) (as is done in the Stan code provided below) versus soft-constraint (by reducing the variance and setting the prior to mean zero). Indicate which works better for the problem based on considerations such as mixing and effective sample size.
- For Model A, produce a caterpillar plot of the parameters \(\beta_l\) (\(l=1,\ldots, 38\)) as follows:
- obtain posterior mean for each random effect \(\beta_l\) and sort them in increasing order
- for each parameter \(\beta_l,\) obtain 80% equitailed credible intervals.
- plot the ordered point estimates and intervals against rank.
- Model B: Consider forcing an ordering constraint for \(\boldsymbol{\alpha}\), so that \(\alpha_1 \leq \cdots \leq \alpha_4.\) What happens to the posterior?
- Plot the marginal posterior density of \(\alpha_1, \ldots, \alpha_4\) for the models with and without the ordering constraint and comment.
- Model C: keep RCP categorical, but let each emission scenario have it’s own response variance \(\omega^2_k,\) so that \(\alpha_k \sim \mathsf{Gauss}(\mu_k, \omega^2_k)\) independent.
- Model D: Replace the categorical variables \(\boldsymbol{\alpha}\) with a linear trend as a function of the value of RCP (i.e., treating the value of RCP as continuous rather than categorical, so that \(\alpha\) is a slope parameter and \[T_{k,l}\mid \boldsymbol{\alpha}, \boldsymbol{\beta}, \tau \sim \mathsf{Gauss}(\mu + \alpha \mathsf{RCP} + \beta_l, \sigma^2).\]
- Compare Models A to D: for each, compute the log likelihood for each posterior draw and use it to obtain the WAIC criterion.
- For a chosen adequate model, report the summary statistics of your procedure (posterior means, std. errors, range and effective sample size) for the variance parameters and the mean increase per RCP \(\alpha_1, \ldots, \alpha_4\) (or \(\alpha \mathsf{RCP}\) for Model D).
Helper code for Problem 2.2
I am providing some Stan code (cross-platform) (download here) and show how to implement the model in R while defining suitable quantities, by casting the factors RCP
and GCM
to integer indicators.
The following code enforces the sum-to-zero constraint for \(\boldsymbol{\beta}\) (see the Stan manual). You can get ordered vectors for \(\boldsymbol{\alpha}\) by replacing vector
by ordered
below.
data {
int<lower=1> N;
int<lower=1> K;
int<lower=2> L;
vector[N] y;
array[N] int<lower=0, upper=K> RCP;
array[N] int<lower=0, upper=L> GCM;
ordered[K] mu;
}
parameters {
vector[K] alpha;
sum_to_zero_vector[L] beta;real<lower=0> sigma;
real<lower=0> tau;
real<lower=0> omega;
}
model {
y ~ normal(alpha[RCP] + beta[GCM], sigma);
alpha ~ normal(mu, omega);0, sqrt(L * inv(L - 1)) * tau);
beta ~ normal(// TODO: Add priors for sigma, tau, omega here
}