Introduction to mixed models

class: center middle main-title section-title-1

# Introduction to mixed models

.class-info[

**Session 10**

.light[MATH 80667A: Experimental Design and Statistical Methods <br>for Quantitative Research in Management <br>
HEC Montréal
]

]

---

layout: true
class: title title-4

---

# Fixed effects

All experiments so far treated factors as **fixed** effects.

- We estimate a mean parameter for each factor (including blocking factors in repeated measures).

.box-inv-4.large[Change of scenery]

---

# Change of scenery

Assume that the levels of a factor form a random sample from a large population.

We are interested in making inference about the **variability** of the factor.

- measures of performance of employees
- results from different labs in an experiment
- subjects in repeated measures

We treat the factor as a **random** effect.

---

# Fixed vs random effects

There is no consensual definition, but Gelman (2005) lists a handful, of which:

> When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random [Green and Tukey (1960)].

> Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population (e.g., Searle, Casella and McCulloch [(1992), Section 1.4])

---

# Random effect model

Consider a one-way model

`$$\underset{\text{response}}{Y_{ij}} =  \underset{\text{global mean}}{\mu} + \underset{\text{random effect}}{\alpha_j} + \underset{\text{error term}}{\varepsilon_{ij}}.$$`

where

- `$\alpha_j \sim \mathsf{No}(0, \sigma^2_\alpha)$` is normal with mean zero and variance `$\sigma^2_\alpha$`.
- `$\varepsilon_{ij}$` are independent `$\mathsf{No}(0, \sigma^2_\varepsilon)$`

---

# Fictional example

Consider the weekly number of hours spent by staff members at HEC since September.

We collect a random sample of 40 employees and ask them to measure the number of hours they work from school for 8 consecutive weeks.

---
# Fitting mixed models in **R**

We use the `lme4` package in **R** to fit the models.

The `lmerTest` package provides additional functionalities for testing.

- `lmer` function fits linear mixed effect regression

Random effects are specified using the notation `(1 | factor)`.

---

# Model fit

```r
library(lmerTest) # also loads lme4
rmod <- lmer(time ~ (1 | id), data = hecedsm::workhours)
summary_rmod <- summary(rmod)
```

```
Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 38.63    6.215   
 Residual              5.68    2.383   
Number of obs: 320, groups:  id, 40

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)  23.3016     0.9917 39.0000    23.5   <2e-16 ***
```

.tiny[Note that std. dev is square root of variance]

---

# Intra-class correlation

We are interested in the variance of the **random effect**, `$\sigma^2_\alpha$`.

Measurements from the same individuals are correlated. The intra-class correlation between measurements `$Y_{ij}$` and `$Y_{ik}$` from subject `$i$` at times `$j\neq k$` is

`$$\rho = \frac{\sigma^2_\alpha}{\sigma^2_\alpha + \sigma^2_\varepsilon}.$$`

In the example, `$\widehat{\sigma}^2_\alpha=38.63$`, `$\widehat{\sigma}^2_\varepsilon = 5.68$` and `$\widehat{\rho} = 0.87$`.

The mean number of working hours on the premises is `$\widehat{\mu}=23.3$` hours.

---
# Confidence intervals

We can use confidence intervals for the parameters.

Those are based on profile likelihood methods (asymmetric).

```r
(conf <- confint(rmod, oldNames = FALSE))
```

```
##                       2.5 %    97.5 %
## sd_(Intercept)|id  4.978127  7.799018
## sigma              2.198813  2.595272
## (Intercept)       21.335343 25.267782
```

The variability of the measurements and between employees is very different from zero.

???

Even with old school, we can get confidence intervals but only in simple designs like this one-way.

---
# Mixed models

Mixed models include both fixed effects and random effects.

- Fixed effects for experimental manipulations
- Random effects for subject, lab factors

Mixed models make it easier to

- handle correlations between measurements and 
- account for more complex designs.

---

# Theory

Full coverage of linear mixed models and general designs is beyond the scope of the course, but note

- Estimation is performed via restricted maximum likelihood (REML)
- Testing results may differ from repeated measure ANOVA
- Different approximations for `$F$` degrees of freedom (either Kenward-Roger (costly) or Satterthwaite approximation)

---
# Structure of the design

It is important to understand how data were gathered.

Oelhert (2010) guidelines

1. Identify sources of variation
2. Identify whether factors are crossed or nested
3. Determine whether factors should be fixed or random
4. Figure out which interactions can exist and whether they can be fitted.

---
# Crossed vs nested effects

.pull-left-wide[
Nested effects if a factor appears only within a particular level of another factor.

Crossed is for everything else (typically combinations of factors are possible).
]
.pull-right-narrow[
![Russian dolls](img/10/matroshka.jpg)
]

.small[

Example of nested random effects: class nested within schools 
- class 1 is not the same in school 1 than in school 2
<img src="img/10/nested.png" width="70%" style="display: block; margin: auto;" />

]

???

Matroschka from Wikimedia Commons CC-BY-SA 3.0
https://en.wikipedia.org/wiki/Matryoshka_doll#/media/File:Matryoshka_transparent.png

---

# Formulae in **R**

**R** uses the following notation

- `group1/group2` means `group2` is nested within `group1`.

The formula expands to `group1 + group1:group2`.

- `group1*group2` means `group` and `group2` are **crossed**

The formula is a shorthand for `group1 + group2 + group1:group2`.

.small[
To fit the model, identifiers of subjects must be declared as factors (categorical variables).
]

---

# Specifying interactions

Consider factors `$A$`, `$B$` and `$C$`.

- If factor `$A$` is treated as random, interactions with `$A$` must be random too.
- There must be repeated measurements to estimate variability of those interactions.
- Testing relies on the variance components.

???

Mixed models vs repeated measures

- In repeated measure ANOVA, repetitions over each sub-condition are averaged across subject.
- Treat subject identifier as a blocking factor (fixed effect).
- Impose sum-to-zero for coefficients.

---

# Data structure

---

# Example: Curley et al. (2022)

.small[

> Two variables were manipulated within participants: (a) evidence anchor (strong-first versus weak-first); (b) verdict system (two- versus three-verdict systems). Total pre-trial bias score was used as a covariate in the analysis (this score is based on the PJAQ and is explained further in the Materials section). Participants were also given two vignettes (Vignette 1 and Vignette 2); thus, the vignette variable was included in the data analysis [...]

> The dependent variable was the final belief of guilt score, which was measured on an accumulated scale from 0–14, with 0 representing no belief of guilt and 14 representing a total belief that the person is guilty

]

---

# Example: chocolate rating

Example from L. Meier, adapted from Oehlert (2010)

> A group of 10 rural and 10 urban raters rated 4 different chocolate types. Every rater got to eat two samples from the same chocolate type in random order.