2.5 Summary of week 2

If X is an n×p design matrix containing covariates and \boldsymbol{Y} is our response variable, we can obtain the ordinary least squares (OLS) coefficients for the linear model \boldsymbol{y} = \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\varepsilon}, \qquad \mathrm{E}(\boldsymbol{\varepsilon})=\boldsymbol{0}_n, by projecting \boldsymbol{y} on to \mathbf{X}; it follows that \mathbf{X}\hat{\boldsymbol{\beta}}=\mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top{\boldsymbol{y}} and \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top{\boldsymbol{y}}.

The dual interpretation (which is used for graphical diagnostics), is the row geometry: each row corresponds to an individual and the response is a 1 dimensional point. \hat{\boldsymbol{\beta}} describes the parameters of the hyperplane that minimizes the sum of squared Euclidean vertical distances between the fitted value \hat{y}_i and the response y_i. The problem is best written using vector-matrix notation, so

\mathrm{argmin}_{\boldsymbol{\beta}} \sum_{i=1}^n (y_i- \mathbf{x}_i\boldsymbol{\beta})^2 \equiv \mathrm{argmin}_{\boldsymbol{\beta}} (\boldsymbol{y} - \mathbf{X}\boldsymbol{\beta})^\top(\boldsymbol{y}-\mathbf{X}\boldsymbol{\beta}) \equiv \boldsymbol{e}^\top\boldsymbol{e}.

The solution to the OLS problem has a dual interpretation in the column geometry, in which we treat the vector of stacked observations (y_1, \ldots, y_n)^\top (respectively the vertical distances (e_1, \ldots, e_n)^\top) as elements of \mathbb{R}^n. There, the response \boldsymbol{y} space can be decomposed into fitted values \hat{{\boldsymbol{y}}} \equiv \mathbf{H}_{\mathbf{X}} = \mathbf{X}\hat{\boldsymbol{\beta}} and residuals \boldsymbol{e} = \mathbf{M}_{\mathbf{X}} = \boldsymbol{y} - \mathbf{X}\hat{\boldsymbol{\beta}}. By construction, \boldsymbol{e} \perp \hat{{\boldsymbol{y}}}.

We therefore get \boldsymbol{y} = \hat{\boldsymbol{y}} + \boldsymbol{e} and since these form a right-angled triangle, Pythagoras’ theorem can be used to show that \|\boldsymbol{y}\|^2 = \|\hat{\boldsymbol{y}}\|^2 + \|\boldsymbol{e}\|^2.