3 Frisch–Waugh–Lovell theorem
The FWL theorem has two components: it gives a formula for partitioned OLS estimates and shows that residuals from sequential regressions are identical.
Consider the following linear regression \[ {\boldsymbol{y}}= \mathbf{X}_1\boldsymbol{\beta}_1+\mathbf{X}_2\boldsymbol{\beta}_2+ \boldsymbol{u}, \label{eq1} \] where the response vector \({\boldsymbol{y}}\) is \(n \times 1\), the vector of errors \(\boldsymbol{u}\) is a realization from a mean zero random variable. The \(n \times p\) full-rank design matrix \(\mathbf{X}\) can be written as the partitioned matrix \((\mathbf{X}_1^\top, \mathbf{X}_2^\top)^\top\) with blocks \(\mathbf{X}_1\), an \(n \times p_1\) matrix, and \(\mathbf{X}_2\), an \(n \times p_2\) matrix. Let \(\hat{\boldsymbol{\beta}}_1\) and \(\hat{\boldsymbol{\beta}}_2\) be the ordinary least square (OLS) parameter estimates from running this regression. Define the orthogonal projection matrix \(\mathbf{H}_\mathbf{X}\) as usual and \(\mathbf{H}_{\mathbf{X}_i} = \mathbf{X}_i(\mathbf{X}_i^\top\mathbf{X}_i)^{-1}\mathbf{X}_i^\top\) for \(i=1, 2\). Similarly, define the complementary projection matrices \(\mathbf{M}_{\mathbf{X}_1}=\mathbf{I}_n-\mathbf{H}_{\mathbf{X}_1}\) and \(\mathbf{M}_{\mathbf{X}_2}=\mathbf{I}_n-\mathbf{H}_{\mathbf{X}_2}\).
Below is an algebraic proof of the equality of the OLS coefficients. The following material is optional.
Proof. The easiest proof uses projection matrices, but we demonstrate the result for OLS coefficients directly. Consider an invertible \(d \times d\) matrix \(\mathbf{C}\) and denote its inverse by \(\mathbf{D}\); then \[ \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12} \\ \mathbf{C}_{21} &\mathbf{C}_{22} \end{pmatrix}\begin{pmatrix} \mathbf{D}_{11} & \mathbf{D}_{12} \\ \mathbf{D}_{21} &\mathbf{D}_{22} \end{pmatrix} =\mathbf{I}_p \] gives the relationships \[\begin{align*} \mathbf{C}_{11}\mathbf{D}_{11}+\mathbf{C}_{12}\mathbf{D}_{21} &= \mathbf{I}_{p_1}\\ \mathbf{C}_{11}\mathbf{D}_{12}+\mathbf{C}_{12}\mathbf{D}_{22} &= \mathbf{O}_{p_1, p_2}\\ \mathbf{C}_{22}\mathbf{D}_{21}+\mathbf{C}_{21}\mathbf{D}_{11} &= \mathbf{O}_{p_2, p_1}\\ \mathbf{C}_{22}\mathbf{D}_{22}+\mathbf{C}_{21}\mathbf{D}_{12} &= \mathbf{I}_{p_2}\\ \end{align*}\] from which we deduce that the so-called Schur complement of \(\mathbf{C}_{22}\) is \[\mathbf{C}_{11}+\mathbf{C}_{12}\mathbf{C}^{-1}_{22}\mathbf{C}_{21} = \mathbf{D}_{11}^{-1}\] and \[ -\mathbf{C}_{22}\mathbf{C}_{21}(\mathbf{C}_{11}+\mathbf{C}_{12}\mathbf{C}^{-1}_{22}\mathbf{C}_{21})^{-1} = \mathbf{D}_{21}. \] Substituting \[ \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12} \\ \mathbf{C}_{21} &\mathbf{C}_{22} \end{pmatrix} \equiv \begin{pmatrix} \mathbf{X}_1^\top\mathbf{X}_1 & \mathbf{X}_1^\top\mathbf{X}_2\\\mathbf{X}_2^\top\mathbf{X}_1 &\mathbf{X}_2^\top\mathbf{X}_2 \end{pmatrix} \] and plug-in this result back in the equation for the least squares yields \[\begin{align*} \hat{\boldsymbol{\beta}}_1 &= (\mathbf{D}_{11}\mathbf{X}_1^\top + \mathbf{D}_{12}\mathbf{X}_2^\top)\boldsymbol{y} \\&= \mathbf{D}_{11}( \mathbf{X}_1^\top - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{X}_2)\boldsymbol{y} \\&= \left(\mathbf{C}_{11}+\mathbf{C}_{12}\mathbf{C}^{-1}_{22}\mathbf{C}_{21}\right)^{-1} \mathbf{X}_1^\top\mathbf{M}_{\mathbf{X}_2}\boldsymbol{y} \\&= (\mathbf{X}_1^\top\mathbf{M}_{\mathbf{X}_2}\mathbf{X}_1)^{-1}\mathbf{X}_1^\top\mathbf{M}_{\mathbf{X}_2}\boldsymbol{y}. \end{align*}\]
The proof that the residuals are the same is left as an exercise.The Frisch–Waugh–Lovell theorem dates back to the work of Frisch, R. and F. Waugh (1933) and of M. Lovell (1963).