8.2 Cross-validation

Denote the ridge estimator \(\hat{\boldsymbol{\beta}}^\lambda\) for a given penalty parameter \(\lambda\). The smoothing matrix, as given in the course notes, is \[\mathbf{S}_{\lambda} = \mathbf{Z}(\mathbf{Z}^\top\mathbf{Z} + \lambda \mathbf{I}_p)^{-1}\mathbf{Z}^\top;\] its trace is \(\mathrm{tr}({\mathbf{S}_{\lambda}}) = \sum_{j=1}^p d_j^2/(d_j^2+\lambda)\), where \(d_{j}\) is the \(j\)th singular value of \(\mathbf{Z}\). The smoothing matrix \(\mathbf{S}_{\lambda}\) is not a projection matrix; its eigenvalues lie in \((0,1)\).

Similar calculations than those used to derive the leave-one-out cross validation residual errors of the PRESS statistic lead to the formula \[\mathrm{CV}_\lambda = \sum_{i=1}^n e_{-i}^2(\lambda) = \sum_{i=1}^n (y_i - \bar{y}- \mathbf{z}_i \hat{\boldsymbol{\gamma}}_{-i}^{\lambda})^2 = \sum_{i=1}^n \frac{(y_i - \bar{y} -\mathbf{z}_i\hat{\boldsymbol{\gamma}}^\lambda)^2}{1-{\mathbf{S}_{\lambda}}_{ii}},\] where \(\mathbf{z}_i\) is the \(i\)th row of \(\mathbf{Z}\). Rather than compute \(\mathbf{S}_{\lambda}\) and its diagonal elements, one can resort to the convenient generalized cross-validation approximation \[\mathrm{GCV}_\lambda = \sum_{i=1}^n \frac{(y_i - \bar{y} -\mathbf{z}_i\hat{\boldsymbol{\gamma}}^\lambda)^2}{1-\mathrm{tr}(\mathbf{S}_{\lambda})/n}\] and the latter is readily computed.

## [1] 13.5

## modified HKB estimator is 8.577316 
## modified L-W estimator is 7.881568 
## smallest value of GCV  at 13.7

Note that in this case, the optimal value of \(\lambda\) found is higher than the theoretical optimum. In practice, we may prefer \(K\)-fold cross-validation to leave-one-out cross validation.