5 Analysis of variance
Consider the linear model \boldsymbol{Y} = \mathbf{1}_n\alpha + \mathbf{Z}\boldsymbol{\beta} + \boldsymbol{\varepsilon} where \mathbf{X}=(\mathbf{1}_n^\top, \mathbf{Z}^\top)^\top is a full rank n \times p design matrix. Let as usual \mathrm{TSS} = \boldsymbol{y}^\top\mathbf{M}_{\mathbf{1}_n}\boldsymbol{y}, the total sum of square, and \mathrm{RSS}= \boldsymbol{y}^\top\mathbf{M}_{\mathbf{X}}\boldsymbol{y}, the sum of squared residuals. Under the assumptions of the Gaussian linear model (or asymptotically), the F-test statistic for testing the null hypothesis \mathrm{H}_0: \boldsymbol{\beta}=\mathbf{0}_{p-1} against the alternative \mathrm{H}_a: \boldsymbol{\beta} \in \mathbb{R}^{p-1} assuming the larger model is correctly specified is F = \frac{(\mathrm{TSS}-\mathrm{RSS})/(p-1)}{\mathrm{RSS}/(n-p)}. Under the null hypothesis, F \sim \mathcal{F}(p-1, n-p).
An ANOVA table (anova
) arranges the information about the sum of squares decomposition, the degree of freedom and the value of the F test statistic in the following manner.
Sum of squares | degrees of freedom | scaled sum of squares | test statistic | P-value |
---|---|---|---|---|
ESS | p-1 | \mathrm{ESS}/(p-1) | F | 1-\texttt{pf}(F, p-1, n-p) |
RSS | n-p | \mathrm{RSS}/(n-p) |