3 Frisch–Waugh–Lovell theorem

The FWL theorem has two components: it gives a formula for partitioned OLS estimates and shows that residuals from sequential regressions are identical.

Consider the following linear regression \[ {\boldsymbol{y}}= \mathbf{X}_1\boldsymbol{\beta}_1+\mathbf{X}_2\boldsymbol{\beta}_2+ \boldsymbol{u}, \label{eq1} \] where the response vector \({\boldsymbol{y}}\) is \(n \times 1\), the vector of errors \(\boldsymbol{u}\) is a realization from a mean zero random variable. The \(n \times p\) full-rank design matrix \(\mathbf{X}\) can be written as the partitioned matrix \((\mathbf{X}_1^\top, \mathbf{X}_2^\top)^\top\) with blocks \(\mathbf{X}_1\), an \(n \times p_1\) matrix, and \(\mathbf{X}_2\), an \(n \times p_2\) matrix. Let \(\hat{\boldsymbol{\beta}}_1\) and \(\hat{\boldsymbol{\beta}}_2\) be the ordinary least square (OLS) parameter estimates from running this regression. Define the orthogonal projection matrix \(\mathbf{H}_\mathbf{X}\) as usual and \(\mathbf{H}_{\mathbf{X}_i} = \mathbf{X}_i(\mathbf{X}_i^\top\mathbf{X}_i)^{-1}\mathbf{X}_i^\top\) for \(i=1, 2\). Similarly, define the complementary projection matrices \(\mathbf{M}_{\mathbf{X}_1}=\mathbf{I}_n-\mathbf{H}_{\mathbf{X}_1}\) and \(\mathbf{M}_{\mathbf{X}_2}=\mathbf{I}_n-\mathbf{H}_{\mathbf{X}_2}\).

Theorem 3.1 The ordinary least square estimates of \(\boldsymbol{\beta}_2\) and the residuals from are identical to those obtained by running the regression \[ \mathbf{M}_{\mathbf{X}_1}{\boldsymbol{y}}= \mathbf{M}_{\mathbf{X}_1}\mathbf{X}_2\boldsymbol{\beta}_2 + \text{residuals}. \label{eq2} \ \]
In general, premultiplying both sides of the regression model by a projection matrix alters the model, so you will get different fitted values and residuals. Similarly, the model \[\boldsymbol{Y} = \mathbf{X}_1 \boldsymbol{\beta}_1 + \mathbf{X}_2\boldsymbol{\beta}_2 + \boldsymbol{\varepsilon}\] is not equivalent to \[ \mathbf{M}_{\mathbf{X}_1}\boldsymbol{y} = \mathbf{M}_{\mathbf{X}_1}\mathbf{X}_2 \boldsymbol{\beta}_2 + \mathbf{M}_{\mathbf{X}_1}\boldsymbol{\varepsilon} \] because \(\boldsymbol{\varepsilon}\) is not in \(\mathbf{M}_\mathbf{X}\). This is true for the orthogonal decomposition \[ \mathbf{M}_{\mathbf{X}_1}\boldsymbol{y} = \mathbf{M}_{\mathbf{X}_1}\mathbf{X}_2 \hat{\boldsymbol{\beta}}_2 + \mathbf{M}_{\mathbf{X}_1}\boldsymbol{e} \]

Below is an algebraic proof of the equality of the OLS coefficients. The following material is optional.

Proof. The easiest proof uses projection matrices, but we demonstrate the result for OLS coefficients directly. Consider an invertible \(d \times d\) matrix \(\mathbf{C}\) and denote its inverse by \(\mathbf{D}\); then \[ \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12} \\ \mathbf{C}_{21} &\mathbf{C}_{22} \end{pmatrix}\begin{pmatrix} \mathbf{D}_{11} & \mathbf{D}_{12} \\ \mathbf{D}_{21} &\mathbf{D}_{22} \end{pmatrix} =\mathbf{I}_p \] gives the relationships \[\begin{align*} \mathbf{C}_{11}\mathbf{D}_{11}+\mathbf{C}_{12}\mathbf{D}_{21} &= \mathbf{I}_{p_1}\\ \mathbf{C}_{11}\mathbf{D}_{12}+\mathbf{C}_{12}\mathbf{D}_{22} &= \mathbf{O}_{p_1, p_2}\\ \mathbf{C}_{22}\mathbf{D}_{21}+\mathbf{C}_{21}\mathbf{D}_{11} &= \mathbf{O}_{p_2, p_1}\\ \mathbf{C}_{22}\mathbf{D}_{22}+\mathbf{C}_{21}\mathbf{D}_{12} &= \mathbf{I}_{p_2}\\ \end{align*}\] from which we deduce that the so-called Schur complement of \(\mathbf{C}_{22}\) is \[\mathbf{C}_{11}+\mathbf{C}_{12}\mathbf{C}^{-1}_{22}\mathbf{C}_{21} = \mathbf{D}_{11}^{-1}\] and \[ -\mathbf{C}_{22}\mathbf{C}_{21}(\mathbf{C}_{11}+\mathbf{C}_{12}\mathbf{C}^{-1}_{22}\mathbf{C}_{21})^{-1} = \mathbf{D}_{21}. \] Substituting \[ \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12} \\ \mathbf{C}_{21} &\mathbf{C}_{22} \end{pmatrix} \equiv \begin{pmatrix} \mathbf{X}_1^\top\mathbf{X}_1 & \mathbf{X}_1^\top\mathbf{X}_2\\\mathbf{X}_2^\top\mathbf{X}_1 &\mathbf{X}_2^\top\mathbf{X}_2 \end{pmatrix} \] and plug-in this result back in the equation for the least squares yields \[\begin{align*} \hat{\boldsymbol{\beta}}_1 &= (\mathbf{D}_{11}\mathbf{X}_1^\top + \mathbf{D}_{12}\mathbf{X}_2^\top)\boldsymbol{y} \\&= \mathbf{D}_{11}( \mathbf{X}_1^\top - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{X}_2)\boldsymbol{y} \\&= \left(\mathbf{C}_{11}+\mathbf{C}_{12}\mathbf{C}^{-1}_{22}\mathbf{C}_{21}\right)^{-1} \mathbf{X}_1^\top\mathbf{M}_{\mathbf{X}_2}\boldsymbol{y} \\&= (\mathbf{X}_1^\top\mathbf{M}_{\mathbf{X}_2}\mathbf{X}_1)^{-1}\mathbf{X}_1^\top\mathbf{M}_{\mathbf{X}_2}\boldsymbol{y}. \end{align*}\]

The proof that the residuals are the same is left as an exercise.

The Frisch–Waugh–Lovell theorem dates back to the work of Frisch, R. and F. Waugh (1933) and of M. Lovell (1963).