10 Generalized linear models

In class, we have covered basic of generalized linear models (GLM), including binary and binomial models with logistic link function and the Poisson regression model with log link. The unifying theory behind GLM will be covered in MATH 408. The goal of this tutorial is to give you additional examples on how to fit these models.

Outside of parameter estimation (which proceeds through Fisher scoring in R), we will look at analysis of deviance (a likelihood ratio test for testing whether coefficients are zero, resulting in a comparison between two nested models). We proceed with the latter in exactly the same way as for analysis of variance, modulo the fact that we use the \(\chi^2\) asymptotic distribution in place of the Fisher \(F\) distribution. Similarly, the \(P\)-values for the Wald tests are based on the asymptotic distribution of the test, which is Gaussian.

The specification of the GLM object in R is analogous to the one for an lm object. We use y ~ x formula syntax to specify the mean relationship. If you have a binomial data, the response should be a two column matrix with integer elements \((k_i, n_i-k_i)\) specifying the number of successes and failures, respectively, for each case or cell.

The second difference between lm and glm is the presence of the argument family that allow you to specify the likelihood

family = gaussian() gives back a linear model;
family = binomial("logit") gives you a binary or binomial regression with logistic link function;
family = poisson() gives you Poisson regression.

By default, empty parenthesis give the so-called canonical link function (identity for normal, logit for binomial and log for Poisson).

Let \(\ell\) denote the log-likelihood function for an \(n\) sample. The function glm uses Fisher scoring to obtain the maximum likelihood estimates, based on the recursion \[ \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} + \mathcal{I}^{-1}_n(\boldsymbol{\beta}^{(t)}) \left. \frac{\partial \ell}{\partial \boldsymbol{\beta}} \right|_{\boldsymbol{\beta}=\boldsymbol{\beta}^{(t)} },\] where the Fisher information \[\mathcal{I}_n =\mathrm{E}\left( \frac{\partial \ell}{\partial \boldsymbol{\beta}}\frac{\partial \ell}{\partial \boldsymbol{\beta}^\top}\right)\] is estimated at the current value of \(\boldsymbol{\beta}^{(t)}\). The IRLS algorithm uses the observed information.