3.1 Revisiting the interpretation of the parameters of a linear model
Geometrically, the linear model \(\boldsymbol{y} = \mathbf{X} \boldsymbol{\beta} + \text{residuals}\) corresponds to the projection on to the span of \(\mathbf{X}\) and gives the line of best fit in that space.
It is perhaps easiest to visualize the two-dimensional case, when \(\mathbf{X} = (\mathbf{1}_n^\top, \mathbf{x}_1^\top)^\top\) is a \(n \times 2\) design matrix and \(\mathbf{x}_1\) is a continuous covariate. In this case, the coefficient vector \(\boldsymbol{\beta}=(\beta_0, \beta_1)^\top\) represent, respectively, the intercept and the slope.
If \(\mathbf{X} = \mathbf{1}_n\), the model only consists of an intercept, which is interpreted as the mean level. Indeed, the projection matrix corresponding to \(\mathbf{1}_n\), \(\mathbf{H}_{\mathbf{1}_n}\), is a matrix whose entries are all identically \(n^{-1}\). The fitted values of this model thus correspond to the mean of \(\boldsymbol{y}\), \(\bar{y}\) and the residuals are the centred values \(\boldsymbol{y}-\mathbf{1}_n \bar{y}\) whose mean is zero.
More generally, for \(\mathbf{X}\) an \(n \times p\) design matrix, the interpretation is as follows: a unit increase in \(\mathrm{x}_{ij}\) (\(\mathrm{x}_{ij} \mapsto \mathrm{x}_{ij}+1)\) leads to a change of \(\beta_j\) unit for \(y_i\) (\(y_i \mapsto \beta_j+y_i\)), other things being held constant. Beware of models with higher order polynomials and interactions: if for example one is interested in the coefficient for \(\mathbf{x}_j\), but \(\mathbf{x}_j^2\) is also a column of the design matrix, then a change of one unit in \(\mathbf{x}_j\) will not lead to a change of \(\beta_jx_j\) for \(y_j\)!
The FWL theorem says the coefficient \(\boldsymbol{\beta}_2\) in the regression \[\boldsymbol{y} =\mathbf{X}_1\boldsymbol{\beta}_1 + \mathbf{X}_2\boldsymbol{\beta}_2 + \boldsymbol{\varepsilon}\] is equivalent to that of the regression \[\mathbf{M}_1\boldsymbol{y} =\mathbf{M}_1 \mathbf{X}_2\boldsymbol{\beta}_2 + \boldsymbol{\varepsilon} \] This can be useful to distangle the effect of one variable.
The intercept coefficient does not correspond to the mean of \(\boldsymbol{y}\) unless the other variables in the design matrix have been centered (meaning they have mean zero). Otherwise, the coefficient \(\beta_0\) associated to the intercept is nothing but the level of \(y\) when all the other variables are set to zero. Adding new variables affects the estimates of the coefficient vector \(\boldsymbol{\beta}\), unless the new variables are orthogonal to the existing lot.