Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

One way ANOVA

Session 3

MATH 80667A: Experimental Design and Statistical Methods
HEC Montréal

1 / 42

Outline

2 / 42

Outline

Hypothesis tests for ANOVA

2 / 42

Outline

Hypothesis tests for ANOVA

Model assumptions

2 / 42

F-test for one way ANOVA

Global null hypothesis

No difference between treatments

  • H0 (null): all of the K treatment groups have the same average μ
  • Ha (alternative): at least two treatments have different averages

Tacitly assume that all observations have the same standard deviation σ.

3 / 42
  • The null hypothesis can be viewed as a special case from a bigger class of possibilities
  • it always corresponds to some restrictions from the alternative class

Building a statistic

  • yik is observation i of group k
  • ˆμ1,,ˆμK are sample averages of groups 1,,K
  • ˆμ is the overall sample mean

Decomposing variability into bits

ik(yikˆμ)2total sum of squares=ik(yikˆμk)2within sum of squares+kni(ˆμkˆμ)2between sum of squares.

null model

alternative model

added variability

4 / 42

F-test statistic

Omnibus test

With K groups and n observations, the statistic is

F=between-group variabilitywithin-group variability=between sum of squares/(K1)within sum of squares/(nK)

5 / 42

Ratio of variance

Data with equal mean (left) and different mean per group (right).

Data with equal mean (left) and different mean per group (right).

6 / 42

What happens under the null regime?

If all groups have the same mean, both numerator and denominator are estimators of σ2, thus

  • the F ratio should be 1 on average if there are no mean differences.
  • but the numerator is more variable because it is based on K observations
    • benchmark is skewed to the right.
7 / 42

Null distribution and degrees of freedom

The null distribution (benchmark) is a Fisher distribution F(ν1,ν2).

The parameters ν1,ν2 are called degrees of freedom.

For the one-way ANOVA:

  • ν1=K1 is the number of constraints imposed by the null (number of groups minus one)
  • ν2=nK is the number of observations minus number of mean parameters estimated under alternative
8 / 42

The number of constraints come from the fact we go from K means under alternative, to 1 mean under null.

Fisher distribution

Note: the F(ν1,ν2) distribution is indistinguishable from χ2(ν1) for ν2 large.

9 / 42

Impact of encouragement on teaching

From Davison (2008), Example 9.2

In an investigation on the teaching of arithmetic, 45 pupils were divided at random into five groups of nine. Groups A and B were taught in separate classes by the usual method. Groups C, D, and E were taught together for a number of days. On each day C were praised publicly for their work, D were publicly reproved and E were ignored. At the end of the period all pupils took a standard test.

10 / 42

Formulating an hypothesis

Let μA,,μE denote the population average (expectation) score for the test for each experimental condition.

The null hypothesis is H0:μA=μB==μE against the alternative Ha that at least one of the population average is different.

11 / 42

F statistic

#Fit one way analysis of variance
test <- aov(data = arithmetic,
formula = score ~ group)
anova(test) #print anova table
term df sum of square mean square statistic p-value
group 4 722.67 180.67 15.27 < 1e-04
Residuals 40 473.33 11.83
12 / 42

P-value

The p-value gives the probability of observing an outcome as extreme if the null hypothesis was true.

# Compute p-value
pf(15.27,
df1 = 4,
df2 = 40,
lower.tail = FALSE)

Probability that a F(4,40) exceeds 15.27.

13 / 42

Model assumptions

14 / 42

Quality of approximations

  • The null and alternative hypothesis of the analysis of variance only specify the mean of each group
  • We need to assume more Read the fine print conditions! to derive the behaviour of the statistic

All statements about p-values
are approximate.

15 / 42

Model assumptions

Additivity and linearity Independence

Equal variance Large sample size

16 / 42

Alternative representation

Write ith observation of kth experimental group as

Yikμkobservationgp=μkmean of group+εikμkerror termgp,

where, for i=1,,nk and k=1,,K,

  • E(εik)=0 (mean zero) and
  • Va(εik)=σ2 (equal variance)
  • errors are independent from one another.
17 / 42

# 1: Additivity

Additive decomposition reads:

(quantity dependingon the treatment used)+(quantity depending only on the particular unit)

  • each unit is unaffected by the treatment of the other units
  • average effect of the treatment is constant
18 / 42

Diagnostic plots for additivity

Plot group averages {ˆμk} against residuals {eik}, where eik=yikˆμk.

By construction, sample mean of eik is always zero.

19 / 42

Lack of additivity

Less improvement for scores of stronger students.

Plot and context suggests multiplicative structure. Tempting to diagnose unequal variance.

20 / 42

Reading diagnostic plots requires practice (and is analogous to reading tea leaves: leaves a lot to interpretation).

Interpretation of residual plots

Look for potential patterns
on the y-axis only!

21 / 42

Multiplicative structure

Multiplicative data of the form (quantity dependingon the treatment used)×(quantity depending only on the particular unit) tend to have higher variability when the response is larger.

22 / 42

Fixes for multiplicative data

A log-transformation of response makes the model additive.

For responses bounded between a and b, reduce warping effects via ln{xa+δb+δx}

Careful with transformations:

  • lose interpretability
  • change of meaning (different scale/units).
23 / 42

If we consider a response on the log-scale, the test is for equality of the geometric mean!

Interactions

Plot residuals against other explanatories.

24 / 42

Difference in average response; while the treatment seems to lead to a decrease in the response variable, a stratification by age group reveals this only occurs in less than 25 group, with a seemingly reversed effect for the adults. Thus, the marginal model implied by the one-way analysis of variance is misleading.

A note about interactions

An interaction occurs when the effect of experimental group depends on another variable.

In principle, randomization ensures we capture the average marginal effect (even if misleading/useless).

We could incorporate the interacting variable in the model capture it's effect (makes model more complex), e.g. using a two-way ANOVA.

25 / 42

# 2: Equal variance

Each observation
has the same
standard deviation σ.

ANOVA is quite sensitive to this assumption!

26 / 42

Graphical diagnostics

Plot standardized (rstandard) or studentized residuals (rstudent) against fitted values.

data(arithmetic, package = "hecedsm")
model <- lm(score ~ group, data = arithmetic)
data <- data.frame(
fitted = fitted(model),
residuals = rstudent(model))
ggplot(data = data,
mapping = aes(x = fitted,
y = residuals)) +
geom_point()
27 / 42

Test diagnostics

Can use a statistical test for H0:σ1==σK.

  • Bartlett's test (assumes normal data)
  • Levene's test: a one-way ANOVA for |yikˆμk|
  • Brown–Forsythe test: a one-way ANOVA for |yikmediank| (more robust)
  • Fligner-Killeen test: based on ranks

Different tests may yield different conclusions

28 / 42

Bartlett is uniformly most powerful for normal data.

Levene and BF are most commonly used in practice (so far of what I have seen)

Example in R

data(arithmetic, package = "hecedsm")
model <- aov(score ~ group, data = arithmetic)
car::leveneTest(model) #Brown-Forsythe by default
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 1.2072 0.3228
## 40

Fail to reject the null: no evidence of unequal variance

29 / 42

Box's take

To make the preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!

Box, G.E.P. (1953). Non-Normality and Tests on Variances. Biometrika 40 (3)-4: 318–335.

30 / 42

Solutions

  • In large sample, power is large so probably always reject H0:σ1==σK.
  • If heterogeneity only per experimental condition, use Welch's ANOVA (oneway.test in R).
  • This statistic estimates the std. deviation of each group separately.
  • Could (should?) be the default when you have large number of observations, or enough to reliably estimate mean and std. deviation.
31 / 42

What can go wrong? Spurious findings!

Reject null hypothesis more often even if no difference in mean!

32 / 42

Histogram of the null distribution of p-values obtained through simulation using the classical analysis of variance F-test (left) and Welch's unequal variance alternative (right), based on 10 000 simulations. Each simulated sample consist of 50 observations from a standard normal distribution and 10 observations from centered normal with variance of 9. The uniform distribution would have 5% in each of the 20 bins used for the display.

More complex heterogeneity patterns

  • Variance-stabilizing transformations (e.g., log for counts)
  • Explicit model for trend over time, etc. may be necessary in more complex design (repeated measure) where there is a learning effect.
33 / 42

# 3: Independence

As a Quebecer, I am not allowed to talk about this topic.

No visual diagnostic or test available.

Rather, infer from context.

34 / 42

Knowing the value of one observation tells us nothing about the value taken by the others.

Checking independence

  • Repeated measures are not independent
  • Group structure (e.g., people performing experiment together and exchanging feedback)
  • Time dependence (time series, longitudinal data).
  • Dependence on instrumentation, experimenter, time of the day, etc.

Observations close by tend to be more alike (correlated).

35 / 42

# 4: Sample size (normality?)

Where does the F-distribution come from?

Normality of group average

This holds (in great generality)
because of the
central limit theorem

36 / 42

Central limit theorem

In large samples, the sample mean is approximately normally distributed.

Top row shows data generating mechanism and a sample, bottom row shows the distribution of the sample mean of n=30 and n=50 observations.

37 / 42

How large should my sample be?

Rule of thumb: 20 or 30 per group

Gather sufficient number of observations.

38 / 42

Assessing approximate normality

The closer data are to being normal, the better the large-sample distribution approximation is.

Can check normality via quantile-quantile plot with standardized residuals ri:

  • on the x-axis, the theoretical quantiles ˆF1{rank(ri)/(n+1)} of the residuals, where F1 is the normal quantile function.
  • on the y-axis, the empirical quantiles ri

In R, use functions qqnorm or car::qqPlot to produce the plots.

39 / 42

More about quantile-quantile plots

The ordered residuals should align on a straight line.

Normal quantile-quantile plot (left) and Tukey's mean different QQ-plot (right).

40 / 42

Recap 1

  • One-way analysis of variance compares average of experimental groups
  • Null hypothesis: all groups have the same average
  • Easier to detect when the null hypothesis is false if:
    • large differences group average
    • small variability
    • large samples
41 / 42

Recap 2

  • Model assumes independent observations, additive structure and equal variability in each group.
  • All statements are approximate, but if model assumptions are invalid, can lead to spurious results or lower power.
42 / 42

Outline

2 / 42
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow