• Linear Models
  • Preliminary remarks
  • 1 Introduction
    • 1.1 Basics of R
      • 1.1.1 Help
      • 1.1.2 Basic commands
      • 1.1.3 Linear algebra in R
      • 1.1.4 Packages
    • 1.2 Tutorial 1
      • 1.2.1 Datasets
      • 1.2.2 Graphics
      • 1.2.3 Projection matrices
    • 1.3 Exercises
      • 1.3.1 Auto dataset
    • 1.4 Solutions
      • 1.4.1 Exercise 1.4 - Oblique projections
    • 1.5 Summary of week 1
  • 2 Computational considerations
    • 2.1 Calculation of least square estimates
      • 2.1.1 Interpretation of the coefficients
    • 2.2 The lm function
      • 2.2.1 Singular value decomposition
      • 2.2.2 QR decomposition
    • 2.3 The hyperplane of fitted values
    • 2.4 (Centered) coefficient of determination
    • 2.5 Summary of week 2
    • 2.6 Solutions
      • 2.6.1 Exercise 3.5 - Prostate cancer
  • 3 Frisch–Waugh–Lovell theorem
    • 3.1 Revisiting the interpretation of the parameters of a linear model
    • 3.2 Factors
    • 3.3 Example: seasonal effects
    • 3.4 Solutions
      • 3.4.1 Exercise 4.4
  • 4 Gaussian linear model
    • 4.1 Confidence and prediction intervals
    • 4.2 Residuals
    • 4.3 Diagnostic plots
      • 4.3.1 Added-variable plots
      • 4.3.2 Diagnostic of heteroscedasticity
      • 4.3.3 Outliers
    • 4.4 Quantile-quantile plots
      • 4.4.1 Quantile-quantile plot of externally studentized errors
      • 4.4.2 Quantile-quantile plot using the QR decomposition
      • 4.4.3 Monte Carlo methods for confidence intervals
      • 4.4.4 Parametric bootstrap confidence intervals using the QR decomposition
    • 4.5 Solutions
      • 4.5.1 Exercise 7.1 - Study of growth hormones
      • 4.5.2 Exercise 7.2 - Electric production of windmills
      • 4.5.3 Exercise 7.3 - Air traffic
      • 4.5.4 Exercise 7.4 - Determinants of earnings
  • 5 Analysis of variance
    • 5.1 Sum of squares decomposition
      • 5.1.1 The decomposition of squares in R
      • 5.1.2 Dropping or adding variables
      • 5.1.3 Biased estimators of the residual sum of square
    • 5.2 One-way ANOVA
    • 5.3 Two-way ANOVA and irrelevant hypotheses
    • 5.4 Solutions
      • 5.4.1 Exercise 9.3 - One-way analysis of variance
      • 5.4.2 Exercise 9.4 - Two-way analysis of variance
  • 6 Hypothesis testing
  • 7 Model selection
    • 7.1 Example: Price of diamonds
      • 7.1.1 Exploratory data analysis
      • 7.1.2 Model selection
      • 7.1.3 Information criterion
      • 7.1.4 Cross-validation
      • 7.1.5 Presentation of results
    • 7.2 Model selection invalidates P-values
  • 8 Penalized regression methods
    • 8.1 Bias and variance tradeoff
    • 8.2 Cross-validation
  • 9 Splines
    • 9.1 Solutions
      • 9.1.1 Exercise 12.4 Smoothing splines
  • 10 Generalized linear models
    • 10.1 Diagnostics for Bernoulli data
    • 10.2 Poisson model for contingency table
    • 10.3 Solutions
      • 10.3.1 Exercise 13.3 - Two-way contingency tables
      • 10.3.2 Exercise 13.5 - Equivalence of binomial and Poisson models
  • Published with bookdown

lineaRmodels

9.1 Solutions

9.1.1 Exercise 12.4 Smoothing splines

# Evaluate GCV at a grid of lambda
lambdas <- seq(0.1, 40, length = 250L)
# Container for GCV criterion
gcv <- rep(0, length(lambdas))  
# Loop over cases
for(i in 1:length(lambdas)){
  # Compute smoothing matrix
  Sm <- B %*% solve(crossprod(B) + lambdas[i]*O) %*% t(B)
  # Compute ridge regression coefficients
  coefs <- lmridge(y = y, X = B, O = O, lambda = lambdas[i])
  # Compute GCV criterion
  gcv[i] <- c(crossprod(y - B %*% coefs)/(1-mean(diag(Sm))))/nrow(B)
}
# Plot GCV
plot(lambdas, gcv, type = "l", xlab = expression(lambda), 
     ylab = "GCV criterion value", main = "Cross validation")
abline(v = lambdas[which.min(gcv)])

# Plot data
plot(y ~ x, xlab = "time after impact (in milliseconds)", ylab = "centered acceleration",
     main = "Simulated motorcycle accident", bty = "l")
# Undersmoothing
#lines(x, B %*% lmridge(y = y, X = B, O = O, lambda = 0.01), col = "green")
# Oversmoothing
#lines(x, B %*% lmridge(y = y, X = B, O = O, lambda = 500), col = "blue")
lines(predict(smooth.spline(y = y, x = x, all.knots = TRUE)))
fitted_opt <- B %*% lmridge(y = y, X = B, O = O,
                    lambda = lambdas[which.min(gcv)])
lines(x, fitted_opt , col = "red", lwd = 2)