5 Analysis of variance
Consider the linear model \(\boldsymbol{Y} = \mathbf{1}_n\alpha + \mathbf{Z}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\) where \(\mathbf{X}=(\mathbf{1}_n^\top, \mathbf{Z}^\top)^\top\) is a full rank \(n \times p\) design matrix. Let as usual \(\mathrm{TSS} = \boldsymbol{y}^\top\mathbf{M}_{\mathbf{1}_n}\boldsymbol{y}\), the total sum of square, and \(\mathrm{RSS}= \boldsymbol{y}^\top\mathbf{M}_{\mathbf{X}}\boldsymbol{y}\), the sum of squared residuals. Under the assumptions of the Gaussian linear model (or asymptotically), the \(F\)-test statistic for testing the null hypothesis \(\mathrm{H}_0: \boldsymbol{\beta}=\mathbf{0}_{p-1}\) against the alternative \(\mathrm{H}_a: \boldsymbol{\beta} \in \mathbb{R}^{p-1}\) assuming the larger model is correctly specified is \[F = \frac{(\mathrm{TSS}-\mathrm{RSS})/(p-1)}{\mathrm{RSS}/(n-p)}.\] Under the null hypothesis, \(F \sim \mathcal{F}(p-1, n-p)\).
An ANOVA table (anova
) arranges the information about the sum of squares decomposition, the degree of freedom and the value of the \(F\) test statistic in the following manner.
Sum of squares | degrees of freedom | scaled sum of squares | test statistic | \(P\)-value |
---|---|---|---|---|
ESS | \(p-1\) | \(\mathrm{ESS}/(p-1)\) | \(F\) | \(1-\texttt{pf}(F, p-1, n-p)\) |
RSS | \(n-p\) | \(\mathrm{RSS}/(n-p)\) |