Workshop 7.3a - Randomized block and repeated measures designs (mixed effects models)

28 Jan 2015

ANCOVA references

Logan (2010) - Chpt 12-14
Quinn & Keough (2002) - Chpt 9-11

Randomized block

A plant pathologist wanted to examine the effects of two different strengths of tobacco virus on the number of lesions on tobacco leaves. She knew from pilot studies that leaves were inherently very variable in response to the virus. In an attempt to account for this leaf to leaf variability, both treatments were applied to each leaf. Eight individual leaves were divided in half, with half of each leaf inoculated with weak strength virus and the other half inoculated with strong virus. So the leaves were blocks and each treatment was represented once in each block. A completely randomised design would have had 16 leaves, with 8 whole leaves randomly allocated to each treatment.

Download Tobacco data set

Format of tobacco.csv data files

LEAF	TREAT	NUMBER
1	Strong	35.898
1	Week	25.02
2	Strong	34.118
2	Week	23.167
3	Strong	35.702
3	Week	24.122
...	...	...

LEAF	The blocking factor - Factor B
TREAT	Categorical representation of the strength of the tobacco virus - main factor of interest Factor A
NUMBER	Number of lesions on that part of the tobacco leaf - response variable

Open the tobacco data file.

Show code

tobacco <- read.table('../downloads/data/tobacco.csv', header=T, sep=',', strip.white=T)
head(tobacco)

  LEAF TREATMENT NUMBER
1   L1    Strong  35.90
2   L1      Weak  25.02
3   L2    Strong  34.12
4   L2      Weak  23.17
5   L3    Strong  35.70
6   L3      Weak  24.12

To appreciate the difference between this design (Complete Randomized Block) in which there is a single within Group effect (Treatment) and a nested design (in which there are between group effects), I will illustrate the current design diagramatically.

Note that each level of the Treatment (Strong and Week) are applied to each Leaf (Block)
Note that Treatments are randomly applied
The Treatment effect is the mean difference between Treatment pairs per leaf
Blocking in this way is very useful when spatial or temporal heterogeneity is likely to add noise that could make it difficualt to detect a difference between Treatments. Hence it is a way of experimentally reducing unexplained variation (compared to nesting which involves statistical reduction of unexplained variation).

Since each level of treatment was applied to each leaf, the data represents a randomized block design with leaves as blocks.

The variable LEAF contains a list of Leaf identification numbers and is supposed to represent a factorial blocking variable. However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor

(HINT).

Show code

tobacco$LEAF <- as.factor(tobacco$LEAF)

What are the main hypotheses being tested?
1. H₀ Factor A:
2. H₀ Factor B:

In the table below, list the assumptions of a randomized block design along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

Assumption	Diagnostic/Risk Minimization
I.
II.
III.
IV.

Is the proposed model balanced?
(Yor N)

Show code

replications(NUMBER~LEAF+TREATMENT, data=tobacco)

     LEAF TREATMENT 
        2         8

!is.list(replications(NUMBER~LEAF+TREATMENT, data=tobacco))

[1] TRUE

There are a number of ways of diagnosing block by within block interactions.
1. The simplest method is to plot the number of lesions for each treatment and leaf combination (ie. an interaction plot
  (HINT). Any evidence of an interaction? Note that we cannot formally test for an interaction because we don't have replicates for each treatment-leaf combination!
2. We can also examine the residual plot from the fitted linear model. A curvilinear pattern in which the residual values switch from positive to negative and back again (or visa versa) over the range of predicted values implies that the scale (magnitude but not polarity) of the main treatment effects differs substantially across the range of blocks. (HINT). Any evidence of an interaction?
3. Perform a Tukey's test for non-additivity evaluated at α = 0.10 or even &alpha = 0.25. This (curvilinear test) formally tests for the presence of a quadratic trend in the relationship between residuals and predicted values. As such, it too is only appropriate for simple interactions of scale. (HINT). Any evidence of an interaction?
4. Examine a lattice graphic to determine the consistency of the effect across each of the blocks(HINT).

Analyse these data with a classic randomized block ANOVA

(HINT) to test the H₀ that there is no difference in the number of lesions produced by the different strength viruses. Complete the table below.

Show code

tobacco.aov <- aov(NUMBER~TREATMENT+Error(LEAF), data=tobacco)
summary(tobacco.aov)

Error: LEAF
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  7    292    41.7               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)   
TREATMENT  1    248   248.3    17.2 0.0043 **
Residuals  7    101    14.5                  
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Source of variation	df	Mean Sq	F-ratio	P-value
LEAF (block)
TREAT
Residuals

Interprete the results.

Now analyse these data with a linear mixed effects model (REML).

Show lme code

library(nlme)
tobacco.lme <- lme(NUMBER~TREATMENT, random=~1|LEAF, data=tobacco, method='ML')
summary(tobacco.lme)

Linear mixed-effects model fit by maximum likelihood
 Data: tobacco 
    AIC   BIC logLik
  102.5 105.6 -47.25

Random effects:
 Formula: ~1 | LEAF
        (Intercept) Residual
StdDev:       3.454    3.557

Fixed effects: NUMBER ~ TREATMENT 
              Value Std.Error DF t-value p-value
(Intercept)   34.94     1.874  7  18.645  0.0000
TREATMENTWeak -7.88     1.901  7  -4.144  0.0043
 Correlation: 
              (Intr)
TREATMENTWeak -0.507

Standardized Within-Group Residuals:
     Min       Q1      Med       Q3      Max 
-1.73025 -0.51798  0.01212  0.43724  1.52636 

Number of Observations: 16
Number of Groups: 8

#unique(confint(tobacco.lme,'TREATMENTWeak')[[1]])
library(gmodels)
ci(tobacco.lme)

              Estimate CI lower CI upper
(Intercept)     34.940    30.51   39.371
TREATMENTWeak   -7.879   -12.38   -3.383
              Std. Error DF   p-value
(Intercept)        1.874  7 3.169e-07
TREATMENTWeak      1.901  7 4.328e-03

anova(tobacco.lme)

            numDF denDF F-value p-value
(Intercept)     1     7   368.5  <.0001
TREATMENT       1     7    17.2  0.0043

Show lmer code

library(lme4)
tobacco.lmer <- lmer(NUMBER~TREATMENT +(1|LEAF), data=tobacco)
summary(tobacco.lmer)

Linear mixed model fit by REML ['lmerMod']
Formula: NUMBER ~ TREATMENT + (1 | LEAF)
   Data: tobacco

REML criterion at convergence: 88.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.6185 -0.4845  0.0113  0.4090  1.4278 

Random effects:
 Groups   Name        Variance Std.Dev.
 LEAF     (Intercept) 13.6     3.69    
 Residual             14.5     3.80    
Number of obs: 16, groups:  LEAF, 8

Fixed effects:
              Estimate Std. Error t value
(Intercept)      34.94       1.87   18.64
TREATMENTWeak    -7.88       1.90   -4.14

Correlation of Fixed Effects:
            (Intr)
TREATMENTWk -0.507

anova(tobacco.lmer)

Analysis of Variance Table
          Df Sum Sq Mean Sq F value
TREATMENT  1    248     248    17.2

# Perform SAS-like p-value calculations (requires the lmerTest and pbkrtest package)
library(lmerTest)
tobacco.lmer <- lmer(NUMBER~TREATMENT +(1|LEAF), data=tobacco)
summary(tobacco.lmer)

Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [
merModLmerTest]
Formula: NUMBER ~ TREATMENT + (1 | LEAF)
   Data: tobacco

REML criterion at convergence: 88.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.6185 -0.4845  0.0113  0.4090  1.4278 

Random effects:
 Groups   Name        Variance Std.Dev.
 LEAF     (Intercept) 13.6     3.69    
 Residual             14.5     3.80    
Number of obs: 16, groups:  LEAF, 8

Fixed effects:
              Estimate Std. Error    df t value Pr(>|t|)    
(Intercept)      34.94       1.87 11.33   18.64  7.4e-10 ***
TREATMENTWeak    -7.88       1.90  7.00   -4.14   0.0043 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
TREATMENTWk -0.507

anova(tobacco.lmer) # Satterthwaite denominator df method

Analysis of Variance Table of type 3  with  Satterthwaite 
approximation for degrees of freedom
          Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)   
TREATMENT    248     248     1     7    17.2 0.0043 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(tobacco.lmer,ddf="Kenward-Roger") # Kenward-Roger denominator df method

Analysis of Variance Table of type 3  with  Kenward-Roger 
approximation for degrees of freedom
          Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)   
TREATMENT    248     248     1     7    17.2 0.0043 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Examine the effect season via a likelihood ratio test
tobacco.lmer1 <- lmer(NUMBER~TREATMENT +(1|LEAF), data=tobacco, REML=F)
tobacco.lmer2 <- lmer(NUMBER~1 +(1|LEAF), data=tobacco, REML=F)
anova(tobacco.lmer1,tobacco.lmer2, test="F")

Data: tobacco
Models:
..1: NUMBER ~ 1 + (1 | LEAF)
object: NUMBER ~ TREATMENT + (1 | LEAF)
       Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)   
..1     3 110 113  -52.2    104.5                           
object  4 102 106  -47.2     94.5  9.98      1     0.0016 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show ezANOVA code

library(ez)
ezANOVA(dv=.(NUMBER),within=.(TREATMENT),wid=.(LEAF), data=tobacco,type=3)

$ANOVA
     Effect DFn DFd     F        p p<.05   ges
2 TREATMENT   1   7 17.17 0.004328     * 0.387

Generate a summary figure with predicted mean number of lesions (and confidence interval) for each treatment.

Show code

newdata <- expand.grid(TREATMENT=c("STRONG","WEAK"), NUMBER=0)
Xmat <- model.matrix(terms(tobacco.lmer),newdata)
Xmat

  (Intercept) TREATMENTWEAK
1           1             0
2           1             1
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$TREATMENT
[1] "contr.treatment"

pred <- as.vector(fixef(tobacco.lmer) %*% t(Xmat))
se <- sqrt(diag(Xmat %*% vcov(tobacco.lmer) %*% t(Xmat)))
newdata$fit <- pred
newdata$lower <- pred - 2*se
newdata$upper <- pred + 2*se

library(ggplot2)
library(grid)
ggplot(newdata, aes(y=fit, x=TREATMENT)) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=0.01)+
  geom_point() +
  scale_x_discrete('Inoculation treatment')+
  scale_y_continuous('Number of lesions')+
  theme_classic() +
  theme(axis.title.y=element_text(vjust=2,size=rel(1.2)),
        axis.title.x=element_text(vjust=-2,size=rel(1.2)),
        plot.margin=unit(c(0.5,0.5,2,2),'lines'))

ggplot(newdata, aes(y=fit, x=TREATMENT)) +
  geom_bar(stat='identity', fill='grey', color='black') +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=0.01)+
  scale_x_discrete('Inoculation treatment')+
  scale_y_continuous('Number of lesions', expand=c(0,0))+
  theme_classic() +
  theme(axis.title.y=element_text(vjust=2,size=rel(1.2)),
        axis.title.x=element_text(vjust=-2,size=rel(1.2)),
        plot.margin=unit(c(0.5,0.5,2,2),'lines'))