5.2 One-way ANOVA

A one-way ANOVA is a test for equality of means in different subpopulations. Under the assumption that the observations have a common variance and that they are normally distributed, this corresponds to a Gaussian linear model with binary indicators (factors) as explanatories. Suppose there are \(L\) possible levels and \(n_j\) is the number of observations for group \(j=1,\ldots, L\).

The one-way ANOVA model can be written as \[y_{i_j,j} = \alpha_j + \varepsilon_{i_j,j}, \quad \varepsilon_{i_j,j} \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(0, \sigma^2)\qquad (j = 1, \ldots, L, i_j= 1, \ldots, n_j). \] Let \(\boldsymbol{y}_j = (y_{1,j}, \ldots, y_{n_j, j})^\top\) denote the vector of observations for the first group and similarly for \(\boldsymbol{\varepsilon_j}\); we can stack observations into a single regression with a matrix \(\mathbf{X}\) of indicators variables, viz. \[\begin{pmatrix}\boldsymbol{y}_1 \\\boldsymbol{y}_2 \\ \vdots \\ \boldsymbol{y}_L\end{pmatrix} = \begin{pmatrix} \mathbf{1}_{n_1} & \mathbf{0}_{n_1}&\cdots & \mathbf{0}_{n_1} \\ \mathbf{0}_{n_2} & \mathbf{1}_{n_2}&\ddots & \mathbf{0}_{n_2} \\ \vdots & \ddots & \ddots & \vdots\\ \mathbf{0}_{n_L} & \mathbf{0}_{n_L}&\cdots & \mathbf{1}_{n_L} \end{pmatrix}\begin{pmatrix} \alpha_1 \\ \alpha_2 \\\vdots \\ \alpha_L\end{pmatrix} + \begin{pmatrix}\boldsymbol{\varepsilon}_1 \\\boldsymbol{\varepsilon}_2 \\ \vdots \\ \boldsymbol{\varepsilon}_L\end{pmatrix}. \]

In a balanced design, the number of observations \(m\) is the same for each of the \(L\) levels of the factor. We can then write \(\mathbf{X}^\top\mathbf{X}= m\mathbf{I}_L\) and the standard errors are the same.

To test \(\mathrm{H}_0: \alpha_1 = \cdots = \alpha_L\), we can use the usual sum of squares decomposition. The \(F\)-statistic for this test is \[F = \frac{(\boldsymbol{y}^\top\mathbf{M}_{\mathbf{1}_n}\boldsymbol{y} - \boldsymbol{y}^\top\mathbf{M}_{\mathbf{X}}\boldsymbol{y})/(L-1)}{ \boldsymbol{y}^\top\mathbf{M}_{\mathbf{X}}\boldsymbol{y}/(n-L)} \sim \mathcal{F}(L-1, n-L).\] and we reject the null hypothesis \(\mu_1 = \cdots = \mu_L\) against the alternative that at least one group mean is different. One crucial assumption is that all groups have the same variance.