Assignment 1

These problems are for credit and to be handed in on February 2nd at the latest.

Problem 1.1

Consider the geometric distribution $geom (p)$ with mass function $\begin{array}{r} f (y; p) = p (1 - p)^{y}, y = 0, 1, 2, \dots; \end{array}$ the latter is used to model the number of failures $Y$ from independent trials until a first success, which occurs with probability $p$ .

If $Y ∣ P = p \sim geom (p)$ and $P \sim beta (α_{1}, α_{2})$ , show that $P ∣ Y = y$ is beta distributed and obtain the parameters of the latter.
Obtain the marginal distribution of $Y$ and show that it is a special case of the beta-negative binomial distribution.
Using the tower property, compute the unconditional mean and variance of $Y$ . Hint: the formulae will depend on the reciprocal moments of a beta distribution, $E_{P} (P^{- 1})$ and $E_{P} (P^{- 2})$ . Complete the kernel to obtain these using the property $Γ (α + 1) = α Γ (α)$ .
Forward sampling: generate data from the marginal of $Y$ as follows
1. Pick values for $(α_{1}, α_{2})$ .¹
2. Draw 10 000 observations from $Y$ by first simulating from $P$ , then from $Y ∣ P$ .
3. Discard the values of $P$ and keep only those for $Y$ .
4. Plot the marginal distribution of $Y$ using a bar plot.
Verify the formulas for the expected value and variance derived previously using Monte Carlo integration.

Problem 1.2

The sweden dataset contains the number of accidents $Y$ per day in Sweden for 1961–1962. Some days, a speed limit was in place on specified days (day) of each year. We write the mean model as $\begin{aligned} E (Y_{i}; λ_{0}, λ_{1}) & = \exp (β_{0} + β_{1} {limit}_{i}) \\ = {\begin{cases} λ_{0} & {limit}_{i} = 0 \\ λ_{1} & {limit}_{i} = 1. \end{cases} \end{aligned}$

Assume each of the 184 observations are independent from two Poisson populations with mean $λ_{0}$ and $λ_{1}$ , when limit=0 and limit=1, respectively. Check Example 3.5 (Should you phrase your headline as a question?) of the course notes

Use a noninformative conjugate prior and obtain posterior samples for $λ_{0}$ and $λ_{1}$ . Use these to obtain $B = 10000$ posterior samples for the mean ratio $λ_{1} / λ_{0}$ and plot a histogram or density estimator of the latter.
Calculate the posterior probability that the speed limit enforcement reduces the average number of accidents.

Problem 1.3

The waiting dataset contains waiting times (in seconds) from 17:59 until the departure of the next metro at the Universite de Montreal station during week-days over three consecutive months.

Assume first that the waiting time are independent and identically distributed as exponential.
1. Use a conjugate gamma prior such that the average waiting time $1 / λ$ has mean 30 seconds and std. deviation 30 seconds.² Give the values of the corresponding shape and rate parameters of the prior.
2. Plot an histogram of prior predictive draws.
3. Derive the posterior distribution and report its parameter values.
4. Calculate the posterior probability of waiting more than 30 seconds analytically and verify the result via Monte Carlo integration.
The post_waiting_weibull contains 10K random samples from the posterior of a Weibull model $Weibull (λ, α)$ with a penalized-complexity prior for the shape parameter with $α \sim PC (θ = 0.5)$ (Niekerk et al., 2021) and $λ \sim inv . gamma (γ, ω)$ with scale $γ = 90$ and shape $ω = 4$ .
1. Draw $B = 1000$ posterior predictive samples of size $n = 62$ from the Weibull and exponential models. For each posterior draw, generate a sample of size $n = 62$ .
2. For each, compute (i) the sample mean, (ii) the sample std. deviation and (iii) the empirical proportion of samples exceeding 30 seconds. Plot an histogram for each of the three summary and each model (Weibull and exponential). Superimpose a vertical line indicating the corresponding function for the original waiting sample. Hence comment on the adequacy (or lack thereof) of the two models.

References

Niekerk, J. van, Bakka, H., & Rue, H. (2021). A principled distance-based prior for the shape of the Weibull model. Statistics & Probability Letters, 174, 109098. https://doi.org/10.1016/j.spl.2021.109098

Footnotes

Take $α_{1} > 3$ for part e.↩︎
Hint: if $Λ \sim gamma (α, β)$ , then the reciprocal rate follows $1 / Λ \sim inv . gamma (α, β)$ with $E (Λ^{- 1}) = β / (α - 1)$ for $α > 1$ and $Va (Λ^{- 1}) = β^{2} / {(α - 1)^{2} (α - 2)}$ . Solve to find the values of the parameters and check numerically by generating data from the inverse gamma distribution.↩︎