Session 9
MATH 80667A: Experimental Design and Statistical Methods
for Quantitative Research in Management
HEC Montréal
Repeated measures
Repeated measures
MANOVA
Each subject (experimental unit) assigned to a single condition.
In many instances, it may be possible to randomly assign multiple conditions to each experimental unit.
Assign (some or) all treatments to subjects and measure the response.
Benefits:
Impact: need smaller sample sizes than between-subjects designs
Potential sources of bias include
As before, we have one experimental factor A with na levels, with
Yijresponsel=μjglobal mean+αjmean difference+Sijrandom effect for subject+εijerrorl
where Si∼No(0,σ2s) and εij∼No(0,σ2e) are random variables.
The errors and random effects are independent from one another.
The model parameters are μ, αj's, σ2s and σ2e.
This dependence structure within group is termed compound symmetry.
An experiment conducted in a graduate course at HEC gathered electroencephalography (EEG) data.
The response variable is the amplitude of a brain signal measured at 170 ms after the participant has been exposed to different faces.
Repeated measures were collected on 12 participants, but we focus only on the average of the replications.
The control (real
) is a true image, whereas the other were generated using a generative adversarial network (GAN) so be slightly smiling (GAN1
) or extremely smiling (GAN2
, looks more fake).
Research question: do the GAN image trigger different reactions (pairwise difference with control)?
If we average, we have a balanced randomized blocked design with
id
(blocking factor)stimulus
(experimental factor)This approach has a drawback: variance estimates can be negative...
We use the afex
package to model the within-subject structure.
# Set sum-to-zero constraint for factorsoptions(contrasts = c("contr.sum", "contr.poly"))data(AA21, package = "hecedsm")# Compute meanAA21_m <- AA21 |> dplyr::group_by(id, stimulus) |> dplyr::summarize(latency = mean(latency))
library(ggplot2)ggplot(data = AA21_m, aes(x = id, colour = stimulus, y = latency)) + geom_point()
model <- afex::aov_ez( id = "id", # subject id dv = "latency", # response within = "stimulus", # within-subject data = hecedsm::AA21, fun_aggregate = mean)anova(model, # mixed ANOVA model correction = "none", # sphericity es = "none") # effect size
# Anova Table (Type 3 tests)# # Response: latency# num Df den Df MSE F Pr(>F)# stimulus 2 22 1.955 0.496 0.6155
The validity of the F null distribution relies on the model having the correct structure.
Since we care only about differences in treatment, can get away with a weaker assumption than compound symmetry.
Sphericity: variance of difference between treatment is constant.
Typically, people test this assumption (using e.g., Mauchly's test of sphericity)
Box suggested to multiply both degrees of freedom of F statistic by ϵ<1.
Another option is to go fully multivariate.
afex
summary(model)
Mauchly Tests for Sphericity Test statistic p-valuestimulus 0.67814 0.14341Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG])stimulus 0.75651 0.5667 HF eps Pr(>F[HF])stimulus 0.8514944 0.5872648
In within-subject designs, contrasts are obtained by computing the contrast for every subject. Make sure to check degrees of freedom!
# Set up contrast vectorcont_vec <- list("real vs GAN" = c(1, -0.5, -0.5))model |> emmeans::emmeans(spec = "stimulus", contr = cont_vec)
## $emmeans## stimulus emmean SE df lower.CL upper.CL## real -10.8 0.942 11 -12.8 -8.70## GAN1 -10.8 0.651 11 -12.3 -9.40## GAN2 -10.3 0.662 11 -11.8 -8.85## ## Confidence level used: 0.95 ## ## $contrasts## contrast estimate SE df t.ratio p.value## real vs GAN -0.202 0.552 11 -0.366 0.7213
From Anandarajan et al. (2002), Canadian Accounting Perspective
This study questions whether the current or proposed Canadian standard of disclosing a going-concern contingency is viewed as equivalent to the standard adopted in the United States by financial statement users. We examined loan officers’ perceptions across three different formats
Bank loan officers were selected as the appropriate financial statement users for this study.
Experiment was conducted on the user’s interpretation of a going-concern contingency when it is provided in one of three disclosure formats:
Postulate the following model: \boldsymbol{Y}_{ij} \sim \mathsf{No}_p(\boldsymbol{\mu}_j, \boldsymbol{\Sigma}), \qquad j = 1, \ldots J
Each response \boldsymbol{Y}_{ij} is p-dimensional.
We assume multivariate measurements are independent of one another, with
The model is fitted using multivariate linear regression.
Confidence ellipses for bivariate MANOVA with discriminant analysis.
The diagonal line is the best separating plane between the two.
One-way analysis of variance would have lower power to detect differences.
Simultaneous confidence region (ellipse), marginal confidence intervals (blue) and Bonferroni-adjusted intervals (green).
The dashed lines show the univariate projections of the confidence ellipse.
The more complex the model, the more assumptions...
Same as ANOVA, with in addition
Normality matters more in small samples.
In addition, for this model to make sense, you need just enough correlation (Goldilock principle)
Only combine elements that theoretically or conceptually make sense together.
The null hypothesis is \mathscr{H}_0: \boldsymbol{\mu}_1 = \cdots = \boldsymbol{\mu}_J against the alternative that at least one vector is different from the rest. The null imposes (J-1) \times p restrictions on the parameters.
With J=2 (bivariate), the MANOVA test finds the best composite score with weights for Y_{i1} and Y_{i2} that maximizes the value of the t-test.
The null distribution is Hotelling's T^2, but a modification of the test statistic can be approximated by a F distribution.
In higher dimensions, with J \geq 3, there are many statistics that can be used to test equality of mean.
The statistics are constructed from within/between sum covariance matrices.
These are
Most give similar conclusion, and they are all equivalent with J=2.
The number of observations must be sufficiently large.
You can use the software G*Power for power calculations.
To achieve a power of 80%, need the following number of replicates per group.
3 groups |
4 groups |
5 groups |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
effect size \ p | 2 | 4 | 6 | 8 | 2 | 4 | 6 | 8 | 2 | 4 | 6 | 8 |
very large | 13 | 16 | 18 | 21 | 14 | 18 | 21 | 23 | 16 | 21 | 24 | 27 |
large | 26 | 33 | 38 | 42 | 29 | 37 | 44 | 48 | 34 | 44 | 52 | 58 |
medium | 44 | 56 | 66 | 72 | 50 | 64 | 74 | 84 | 60 | 76 | 90 | 100 |
small | 98 | 125 | 145 | 160 | 115 | 145 | 165 | 185 | 135 | 170 | 200 | 230 |
Laüter, J. (1978), Sample size requirements for the T^2 test of MANOVA (tables for one-way classification). Biometrical Journal, 20, 389-406.
Researchers often conduct post hoc univariate tests using univariate ANOVA.
In R, Holm-Bonferonni's method is applied for marginal tests. You need to correct for multiple testing!
A better option is to proceed with descriptive discriminant analysis, a method that tries to find the linear combinations of the vector means to discriminate between groups.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Session 9
MATH 80667A: Experimental Design and Statistical Methods
for Quantitative Research in Management
HEC Montréal
Repeated measures
Repeated measures
MANOVA
Each subject (experimental unit) assigned to a single condition.
In many instances, it may be possible to randomly assign multiple conditions to each experimental unit.
Assign (some or) all treatments to subjects and measure the response.
Benefits:
Impact: need smaller sample sizes than between-subjects designs
Potential sources of bias include
As before, we have one experimental factor A with n_a levels, with
\begin{align*}\underset{\text{response}\vphantom{l}}{Y_{ij}} = \underset{\text{global mean}}{\mu_{\vphantom{j}}} + \underset{\text{mean difference}}{\alpha_j} + \underset{\text{random effect for subject}}{S_{i\vphantom{j}}} + \underset{\text{error}\vphantom{l}}{\varepsilon_{ij}}\end{align*}
where S_i \sim \mathsf{No}(0, \sigma^2_s) and \varepsilon_{ij} \sim \mathsf{No}(0, \sigma^2_e) are random variables.
The errors and random effects are independent from one another.
The model parameters are \mu, \alpha_j's, \sigma^2_s and \sigma^2_e.
This dependence structure within group is termed compound symmetry.
An experiment conducted in a graduate course at HEC gathered electroencephalography (EEG) data.
The response variable is the amplitude of a brain signal measured at 170 ms after the participant has been exposed to different faces.
Repeated measures were collected on 12 participants, but we focus only on the average of the replications.
The control (real
) is a true image, whereas the other were generated using a generative adversarial network (GAN) so be slightly smiling (GAN1
) or extremely smiling (GAN2
, looks more fake).
Research question: do the GAN image trigger different reactions (pairwise difference with control)?
If we average, we have a balanced randomized blocked design with
id
(blocking factor)stimulus
(experimental factor)This approach has a drawback: variance estimates can be negative...
We use the afex
package to model the within-subject structure.
# Set sum-to-zero constraint for factorsoptions(contrasts = c("contr.sum", "contr.poly"))data(AA21, package = "hecedsm")# Compute meanAA21_m <- AA21 |> dplyr::group_by(id, stimulus) |> dplyr::summarize(latency = mean(latency))
library(ggplot2)ggplot(data = AA21_m, aes(x = id, colour = stimulus, y = latency)) + geom_point()
model <- afex::aov_ez( id = "id", # subject id dv = "latency", # response within = "stimulus", # within-subject data = hecedsm::AA21, fun_aggregate = mean)anova(model, # mixed ANOVA model correction = "none", # sphericity es = "none") # effect size
# Anova Table (Type 3 tests)# # Response: latency# num Df den Df MSE F Pr(>F)# stimulus 2 22 1.955 0.496 0.6155
The validity of the F null distribution relies on the model having the correct structure.
Since we care only about differences in treatment, can get away with a weaker assumption than compound symmetry.
Sphericity: variance of difference between treatment is constant.
Typically, people test this assumption (using e.g., Mauchly's test of sphericity)
Box suggested to multiply both degrees of freedom of F statistic by \epsilon < 1.
Another option is to go fully multivariate.
afex
summary(model)
Mauchly Tests for Sphericity Test statistic p-valuestimulus 0.67814 0.14341Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG])stimulus 0.75651 0.5667 HF eps Pr(>F[HF])stimulus 0.8514944 0.5872648
In within-subject designs, contrasts are obtained by computing the contrast for every subject. Make sure to check degrees of freedom!
# Set up contrast vectorcont_vec <- list("real vs GAN" = c(1, -0.5, -0.5))model |> emmeans::emmeans(spec = "stimulus", contr = cont_vec)
## $emmeans## stimulus emmean SE df lower.CL upper.CL## real -10.8 0.942 11 -12.8 -8.70## GAN1 -10.8 0.651 11 -12.3 -9.40## GAN2 -10.3 0.662 11 -11.8 -8.85## ## Confidence level used: 0.95 ## ## $contrasts## contrast estimate SE df t.ratio p.value## real vs GAN -0.202 0.552 11 -0.366 0.7213
From Anandarajan et al. (2002), Canadian Accounting Perspective
This study questions whether the current or proposed Canadian standard of disclosing a going-concern contingency is viewed as equivalent to the standard adopted in the United States by financial statement users. We examined loan officers’ perceptions across three different formats
Bank loan officers were selected as the appropriate financial statement users for this study.
Experiment was conducted on the user’s interpretation of a going-concern contingency when it is provided in one of three disclosure formats:
Postulate the following model: \boldsymbol{Y}_{ij} \sim \mathsf{No}_p(\boldsymbol{\mu}_j, \boldsymbol{\Sigma}), \qquad j = 1, \ldots J
Each response \boldsymbol{Y}_{ij} is p-dimensional.
We assume multivariate measurements are independent of one another, with
The model is fitted using multivariate linear regression.
Confidence ellipses for bivariate MANOVA with discriminant analysis.
The diagonal line is the best separating plane between the two.
One-way analysis of variance would have lower power to detect differences.
Simultaneous confidence region (ellipse), marginal confidence intervals (blue) and Bonferroni-adjusted intervals (green).
The dashed lines show the univariate projections of the confidence ellipse.
The more complex the model, the more assumptions...
Same as ANOVA, with in addition
Normality matters more in small samples.
In addition, for this model to make sense, you need just enough correlation (Goldilock principle)
Only combine elements that theoretically or conceptually make sense together.
The null hypothesis is \mathscr{H}_0: \boldsymbol{\mu}_1 = \cdots = \boldsymbol{\mu}_J against the alternative that at least one vector is different from the rest. The null imposes (J-1) \times p restrictions on the parameters.
With J=2 (bivariate), the MANOVA test finds the best composite score with weights for Y_{i1} and Y_{i2} that maximizes the value of the t-test.
The null distribution is Hotelling's T^2, but a modification of the test statistic can be approximated by a F distribution.
In higher dimensions, with J \geq 3, there are many statistics that can be used to test equality of mean.
The statistics are constructed from within/between sum covariance matrices.
These are
Most give similar conclusion, and they are all equivalent with J=2.
The number of observations must be sufficiently large.
You can use the software G*Power for power calculations.
To achieve a power of 80%, need the following number of replicates per group.
3 groups |
4 groups |
5 groups |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
effect size \ p | 2 | 4 | 6 | 8 | 2 | 4 | 6 | 8 | 2 | 4 | 6 | 8 |
very large | 13 | 16 | 18 | 21 | 14 | 18 | 21 | 23 | 16 | 21 | 24 | 27 |
large | 26 | 33 | 38 | 42 | 29 | 37 | 44 | 48 | 34 | 44 | 52 | 58 |
medium | 44 | 56 | 66 | 72 | 50 | 64 | 74 | 84 | 60 | 76 | 90 | 100 |
small | 98 | 125 | 145 | 160 | 115 | 145 | 165 | 185 | 135 | 170 | 200 | 230 |
Laüter, J. (1978), Sample size requirements for the T^2 test of MANOVA (tables for one-way classification). Biometrical Journal, 20, 389-406.
Researchers often conduct post hoc univariate tests using univariate ANOVA.
In R, Holm-Bonferonni's method is applied for marginal tests. You need to correct for multiple testing!
A better option is to proceed with descriptive discriminant analysis, a method that tries to find the linear combinations of the vector means to discriminate between groups.