library(knitr) # for kable
library(magrittr) # for the pipe operator
library(fractional) # for fractional, to simplify matrix display
library(emmeans) # for emmeans and pairs
library(gmodels) # for fit.contrast
Exercise experiment: 8 treatments (different ways of using an exercise bike) and two blocking factors (day and subject). The response is pulse rate.
exercise <- read.csv("exercise.csv", header = T)
colnames(exercise) <-c("Day", "Subject", "Treatment", "Y")
kable(exercise, align = rep('c', 4))
Day | Subject | Treatment | Y |
---|---|---|---|
1 | 1 | 2 | 45 |
1 | 2 | 6 | 25 |
1 | 3 | 7 | 18 |
2 | 1 | 8 | 27 |
2 | 2 | 5 | 20 |
2 | 3 | 6 | 32 |
3 | 1 | 6 | 40 |
3 | 2 | 8 | 23 |
3 | 3 | 2 | 28 |
4 | 1 | 7 | 17 |
4 | 2 | 2 | 32 |
4 | 3 | 4 | 24 |
5 | 1 | 5 | 30 |
5 | 2 | 1 | 36 |
5 | 3 | 8 | 20 |
6 | 1 | 4 | 29 |
6 | 2 | 7 | 13 |
6 | 3 | 3 | 20 |
7 | 1 | 1 | 34 |
7 | 2 | 3 | 18 |
7 | 3 | 5 | 25 |
8 | 1 | 3 | 21 |
8 | 2 | 4 | 22 |
8 | 3 | 1 | 34 |
Design is what is known as a latin square (incomplete block design with two blocking factors)
exercise$Day <- as.factor(exercise$Day)
exercise$Subject <- as.factor(exercise$Subject)
exercise$Treatment <- as.factor(exercise$Treatment)
Ymat <- matrix(exercise$Y, nrow = 8, ncol = 3, byrow = T)
Tmat <- matrix(exercise$Treatment, nrow = 8, ncol = 3, byrow = T)
Ymat
[,1] [,2] [,3]
[1,] 45 25 18
[2,] 27 20 32
[3,] 40 23 28
[4,] 17 32 24
[5,] 30 36 20
[6,] 29 13 20
[7,] 34 18 25
[8,] 21 22 34
Tmat
[,1] [,2] [,3]
[1,] "2" "6" "7"
[2,] "8" "5" "6"
[3,] "6" "8" "2"
[4,] "7" "2" "4"
[5,] "5" "1" "8"
[6,] "4" "7" "3"
[7,] "1" "3" "5"
[8,] "3" "4" "1"
B <- cbind(b0 = 1, contrasts(exercise$Treatment))
fractional(B)
b0 2 3 4 5 6 7 8
1 1 . . . . . . .
2 1 1 . . . . . .
3 1 . 1 . . . . .
4 1 . . 1 . . . .
5 1 . . . 1 . . .
6 1 . . . . 1 . .
7 1 . . . . . 1 .
8 1 . . . . . . 1
solve(B) %>% fractional
1 2 3 4 5 6 7 8
b0 1 . . . . . . .
2 -1 1 . . . . . .
3 -1 . 1 . . . . .
4 -1 . . 1 . . . .
5 -1 . . . 1 . . .
6 -1 . . . . 1 . .
7 -1 . . . . . 1 .
8 -1 . . . . . . 1
See the vignette for the codingMatrices package
exercise.lm <- lm(Y ~ Day + Subject + Treatment, data = exercise)
exercise.lm
Call:
lm(formula = Y ~ Day + Subject + Treatment, data = exercise)
Coefficients:
(Intercept) Day2 Day3 Day4 Day5 Day6 Day7 Day8 Subject2
41.750 -2.429 -1.420 -3.188 -1.473 -2.357 -3.786 -3.991 -6.750
Subject3 Treatment2 Treatment3 Treatment4 Treatment5 Treatment6 Treatment7 Treatment8
-5.250 -1.214 -14.705 -9.571 -10.187 -4.134 -19.902 -12.643
anova(exercise.lm)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
Day 7 202.29 28.899 1.2045 0.40619
Subject 2 201.00 100.500 4.1888 0.06364 .
Treatment 7 854.39 122.055 5.0873 0.02383 *
Residuals 7 167.95 23.992
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Significant difference between treatments. What happens if we ignore blocks?
exercise_nb.lm <- lm(Y ~ Treatment, data = exercise)
anova(exercise_nb.lm)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 7 1032.96 147.565 6.0129 0.001461 **
Residuals 16 392.67 24.542
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Very different effect of treatments - no longer adjusted for blocks (and not orthogonal).
Which treatments differ? Work out all pairwise comparisons.
marginal <- emmeans(exercise.lm, ~ Treatment) # emmeans and pairs from emmeans package
pairs1 <- pairs(marginal, adjust = "none")
pairs1
contrast estimate SE df t.ratio p.value
Treatment1 - Treatment2 1.214 5.55 7 0.219 0.8332
Treatment1 - Treatment3 14.705 4.39 7 3.349 0.0123
Treatment1 - Treatment4 9.571 4.94 7 1.937 0.0940
Treatment1 - Treatment5 10.188 4.39 7 2.320 0.0534
Treatment1 - Treatment6 4.134 5.44 7 0.760 0.4719
Treatment1 - Treatment7 19.902 5.44 7 3.660 0.0081
Treatment1 - Treatment8 12.643 4.94 7 2.558 0.0376
Treatment2 - Treatment3 13.491 5.44 7 2.481 0.0421
Treatment2 - Treatment4 8.357 4.94 7 1.691 0.1347
Treatment2 - Treatment5 8.973 5.44 7 1.650 0.1429
Treatment2 - Treatment6 2.920 4.39 7 0.665 0.5274
Treatment2 - Treatment7 18.688 4.39 7 4.256 0.0038
Treatment2 - Treatment8 11.429 4.94 7 2.313 0.0540
Treatment3 - Treatment4 -5.134 4.39 7 -1.169 0.2806
Treatment3 - Treatment5 -4.518 4.94 7 -0.914 0.3910
Treatment3 - Treatment6 -10.571 5.55 7 -1.903 0.0987
Treatment3 - Treatment7 5.196 4.94 7 1.052 0.3280
Treatment3 - Treatment8 -2.062 5.44 7 -0.379 0.7157
Treatment4 - Treatment5 0.616 5.44 7 0.113 0.9130
Treatment4 - Treatment6 -5.438 5.44 7 -1.000 0.3506
Treatment4 - Treatment7 10.330 4.39 7 2.353 0.0509
Treatment4 - Treatment8 3.071 5.55 7 0.553 0.5975
Treatment5 - Treatment6 -6.054 4.94 7 -1.225 0.2602
Treatment5 - Treatment7 9.714 5.55 7 1.749 0.1238
Treatment5 - Treatment8 2.455 4.39 7 0.559 0.5935
Treatment6 - Treatment7 15.768 4.94 7 3.191 0.0153
Treatment6 - Treatment8 8.509 4.39 7 1.938 0.0938
Treatment7 - Treatment8 -7.259 5.44 7 -1.335 0.2236
Results are averaged over the levels of: Day, Subject
subset(transform(pairs1), p.value < 0.05) # note that transform just turns an emmeans object into a data frame
What is the type I error rate?
alpha <- 0.05
1 - (1 - alpha)^nrow(transform(pairs1)) # 28 comparisons
[1] 0.7621731
This is the type I error rate if all the tests are independent.
Adjust to get experimentwise type I error rate of (close to) \(\alpha\).
pairs_b <- pairs(marginal, adjust = "bonferroni")
pairs_t <- pairs(marginal, adjust = "tukey")
subset(transform(pairs_b), p.value < alpha)
subset(transform(pairs_t), p.value < alpha)
What about if there is structure in the treatments?
contr.duration <- c(-1, -1, -1, -1, 1, 1, 1, 1)
contr.speed <- c(-1, -1, 1, 1, -1, -1, 1, 1)
contr.pedal <- c(-1, 1, -1, 1, -1, 1, -1, 1)
contr.dxs <- contr.duration * contr.speed
contr.dxp <- contr.duration * contr.pedal
contr.sxp <- contr.speed * contr.pedal
contr.dxsxp <- contr.duration * contr.speed * contr.pedal
contr_mat <- rbind(contr.duration, contr.speed, contr.pedal, contr.dxs, contr.dxp, contr.sxp, contr.dxsxp)
cbind(treat = 1:8, t(contr_mat))
treat contr.duration contr.speed contr.pedal contr.dxs contr.dxp contr.sxp contr.dxsxp
[1,] 1 -1 -1 -1 1 1 1 -1
[2,] 2 -1 -1 1 1 -1 -1 1
[3,] 3 -1 1 -1 -1 1 -1 1
[4,] 4 -1 1 1 -1 -1 1 -1
[5,] 5 1 -1 -1 -1 -1 1 1
[6,] 6 1 -1 1 -1 1 -1 -1
[7,] 7 1 1 -1 1 -1 -1 -1
[8,] 8 1 1 1 1 1 1 1
fit.contrast(exercise.lm, "Treatment", contr_mat / 4) # from gmodels
Estimate Std. Error t value Pr(>|t|)
Treatmentcontr.duration -5.343750 2.718547 -1.9656638 0.090064906
Treatmentcontr.speed -10.321429 2.470869 -4.1772458 0.004151473
Treatmentcontr.pedal 4.308036 2.718547 1.5846830 0.157056785
Treatmentcontr.dxs 1.209821 2.195425 0.5510648 0.598729019
Treatmentcontr.dxp 2.348214 2.470869 0.9503595 0.373580058
Treatmentcontr.sxp 1.888393 2.718547 0.6946331 0.509676502
Treatmentcontr.dxsxp -1.285714 2.267425 -0.5670371 0.588402205
attr(,"class")
[1] "fit_contrast"
Different to simply averaging observations (again because of non-orthogonal blocking)
with(exercise, {
c(
mean(Y[Treatment %in% 5:8]) - mean(Y[Treatment %in% 1:4]), # duration
mean(Y[Treatment %in% c(3, 4, 7, 8)]) - mean(Y[Treatment %in% c(1, 2, 5, 6)]), # speed
mean(Y[Treatment %in% c(2, 4, 6, 8)]) - mean(Y[Treatment %in% c(1, 3, 5, 7)]) # pedal
)
})
[1] -4.416667 -10.750000 5.083333