5 Analysis of variance

Consider the linear model \boldsymbol{Y} = \mathbf{1}_n\alpha + \mathbf{Z}\boldsymbol{\beta} + \boldsymbol{\varepsilon} where \mathbf{X}=(\mathbf{1}_n^\top, \mathbf{Z}^\top)^\top is a full rank n \times p design matrix. Let as usual \mathrm{TSS} = \boldsymbol{y}^\top\mathbf{M}_{\mathbf{1}_n}\boldsymbol{y}, the total sum of square, and \mathrm{RSS}= \boldsymbol{y}^\top\mathbf{M}_{\mathbf{X}}\boldsymbol{y}, the sum of squared residuals. Under the assumptions of the Gaussian linear model (or asymptotically), the F-test statistic for testing the null hypothesis \mathrm{H}_0: \boldsymbol{\beta}=\mathbf{0}_{p-1} against the alternative \mathrm{H}_a: \boldsymbol{\beta} \in \mathbb{R}^{p-1} assuming the larger model is correctly specified is F = \frac{(\mathrm{TSS}-\mathrm{RSS})/(p-1)}{\mathrm{RSS}/(n-p)}. Under the null hypothesis, F \sim \mathcal{F}(p-1, n-p).

An ANOVA table (anova) arranges the information about the sum of squares decomposition, the degree of freedom and the value of the F test statistic in the following manner.

Sum of squares degrees of freedom scaled sum of squares test statistic P-value
ESS p-1 \mathrm{ESS}/(p-1) F 1-\texttt{pf}(F, p-1, n-p)
RSS n-p \mathrm{RSS}/(n-p)