10 Generalized linear models
In class, we have covered basic of generalized linear models (GLM), including binary and binomial models with logistic link function and the Poisson regression model with log link. The unifying theory behind GLM will be covered in MATH 408. The goal of this tutorial is to give you additional examples on how to fit these models.
Outside of parameter estimation (which proceeds through Fisher scoring in R), we will look at analysis of deviance (a likelihood ratio test for testing whether coefficients are zero, resulting in a comparison between two nested models). We proceed with the latter in exactly the same way as for analysis of variance, modulo the fact that we use the \(\chi^2\) asymptotic distribution in place of the Fisher \(F\) distribution. Similarly, the \(P\)-values for the Wald tests are based on the asymptotic distribution of the test, which is Gaussian.
The specification of the GLM object in R is analogous to the one for an lm
object.
We use y ~ x
formula syntax to specify the mean relationship. If you have a binomial data, the response should be a two column matrix with integer elements \((k_i, n_i-k_i)\) specifying the number of successes and failures, respectively, for each case or cell.
The second difference between lm
and glm
is the presence of the argument family
that allow you to specify the likelihood
family = gaussian()
gives back a linear model;family = binomial("logit")
gives you a binary or binomial regression with logistic link function;family = poisson()
gives you Poisson regression.
By default, empty parenthesis give the so-called canonical link function (identity for normal, logit for binomial and log for Poisson).
Let \(\ell\) denote the log-likelihood function for an \(n\) sample. The function glm
uses Fisher scoring to obtain the maximum likelihood estimates, based on the recursion
\[ \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} + \mathcal{I}^{-1}_n(\boldsymbol{\beta}^{(t)}) \left. \frac{\partial \ell}{\partial \boldsymbol{\beta}} \right|_{\boldsymbol{\beta}=\boldsymbol{\beta}^{(t)} },\]
where the Fisher information
\[\mathcal{I}_n =\mathrm{E}\left( \frac{\partial \ell}{\partial \boldsymbol{\beta}}\frac{\partial \ell}{\partial \boldsymbol{\beta}^\top}\right)\]
is estimated at the current value of \(\boldsymbol{\beta}^{(t)}\). The IRLS algorithm uses the observed information.