Jump to main navigation


Workshop 7.3a - Randomized block and repeated measures designs (mixed effects models)

28 Jan 2015

ANCOVA references

  • Logan (2010) - Chpt 12-14
  • Quinn & Keough (2002) - Chpt 9-11

Randomized block

A plant pathologist wanted to examine the effects of two different strengths of tobacco virus on the number of lesions on tobacco leaves. She knew from pilot studies that leaves were inherently very variable in response to the virus. In an attempt to account for this leaf to leaf variability, both treatments were applied to each leaf. Eight individual leaves were divided in half, with half of each leaf inoculated with weak strength virus and the other half inoculated with strong virus. So the leaves were blocks and each treatment was represented once in each block. A completely randomised design would have had 16 leaves, with 8 whole leaves randomly allocated to each treatment.

Download Tobacco data set

Format of tobacco.csv data files
LEAFTREATNUMBER
1Strong35.898
1Week25.02
2Strong34.118
2Week23.167
3Strong35.702
3Week24.122
.........
LEAFThe blocking factor - Factor B
TREATCategorical representation of the strength of the tobacco virus - main factor of interest Factor A
NUMBERNumber of lesions on that part of the tobacco leaf - response variable
Starlings

Open the tobacco data file.

Show code
tobacco <- read.table('../downloads/data/tobacco.csv', header=T, sep=',', strip.white=T)
head(tobacco)
  LEAF TREATMENT NUMBER
1   L1    Strong  35.90
2   L1      Weak  25.02
3   L2    Strong  34.12
4   L2      Weak  23.17
5   L3    Strong  35.70
6   L3      Weak  24.12

To appreciate the difference between this design (Complete Randomized Block) in which there is a single within Group effect (Treatment) and a nested design (in which there are between group effects), I will illustrate the current design diagramatically.


  • Note that each level of the Treatment (Strong and Week) are applied to each Leaf (Block)
  • Note that Treatments are randomly applied
  • The Treatment effect is the mean difference between Treatment pairs per leaf
  • Blocking in this way is very useful when spatial or temporal heterogeneity is likely to add noise that could make it difficualt to detect a difference between Treatments. Hence it is a way of experimentally reducing unexplained variation (compared to nesting which involves statistical reduction of unexplained variation).

Since each level of treatment was applied to each leaf, the data represents a randomized block design with leaves as blocks.

The variable LEAF contains a list of Leaf identification numbers and is supposed to represent a factorial blocking variable. However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor
(HINT).
Show code
tobacco$LEAF <- as.factor(tobacco$LEAF)
  1. What are the main hypotheses being tested?
    1. H0 Factor A:
    2. H0 Factor B:
  2. In the table below, list the assumptions of a randomized block design along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.
    1. AssumptionDiagnostic/Risk Minimization
      I.
      II.
      III.
      IV.
    2. Is the proposed model balanced?
      (Yor N)
    3. Show code
      replications(NUMBER~LEAF+TREATMENT, data=tobacco)
      
           LEAF TREATMENT 
              2         8 
      
      !is.list(replications(NUMBER~LEAF+TREATMENT, data=tobacco))
      
      [1] TRUE
      
  3. There are a number of ways of diagnosing block by within block interactions.
    1. The simplest method is to plot the number of lesions for each treatment and leaf combination (ie. an interaction plot
      (HINT). Any evidence of an interaction? Note that we cannot formally test for an interaction because we don't have replicates for each treatment-leaf combination!
    2. Show code
      library(car)
      interaction.plot(tobacco$LEAF, tobacco$TREATMENT, tobacco$NUMBER)
      
      plot of chunk Q2-3
      Show ggplot code
      library(ggplot2)
      ggplot(tobacco, aes(y=NUMBER, x=LEAF, group=TREATMENT, color=TREATMENT)) +
        geom_line()
      
      plot of chunk Q2-3a
    3. We can also examine the residual plot from the fitted linear model. A curvilinear pattern in which the residual values switch from positive to negative and back again (or visa versa) over the range of predicted values implies that the scale (magnitude but not polarity) of the main treatment effects differs substantially across the range of blocks. (HINT). Any evidence of an interaction?
    4. Show code
      residualPlots(lm(NUMBER~LEAF+TREATMENT, data=tobacco))
      
                 Test stat Pr(>|t|)
      LEAF              NA       NA
      TREATMENT         NA       NA
      Tukey test     -0.18    0.858
      
      residualPlots(lm(NUMBER~LEAF+TREATMENT, data=tobacco))
      
      plot of chunk Q2_3b
                 Test stat Pr(>|t|)
      LEAF              NA       NA
      TREATMENT         NA       NA
      Tukey test     -0.18    0.858
      
    5. Perform a Tukey's test for non-additivity evaluated at α = 0.10 or even &alpha = 0.25. This (curvilinear test) formally tests for the presence of a quadratic trend in the relationship between residuals and predicted values. As such, it too is only appropriate for simple interactions of scale. (HINT). Any evidence of an interaction?
    6. Show code
      library(alr3)
      tukey.nonadd.test(lm(NUMBER~LEAF+TREATMENT, data=tobacco))
      
         Test  Pvalue 
      -0.1795  0.8575 
      
    7. Examine a lattice graphic to determine the consistency of the effect across each of the blocks(HINT).
    8. Show code
      library(lattice)
      print(barchart(NUMBER~TREATMENT|LEAF, data=tobacco))
      
      plot of chunk Q2-2d
  4. Analyse these data with a classic randomized block ANOVA
    (HINT) to test the H0 that there is no difference in the number of lesions produced by the different strength viruses. Complete the table below.
    Show code
    tobacco.aov <- aov(NUMBER~TREATMENT+Error(LEAF), data=tobacco)
    summary(tobacco.aov)
    
    Error: LEAF
              Df Sum Sq Mean Sq F value Pr(>F)
    Residuals  7    292    41.7               
    
    Error: Within
              Df Sum Sq Mean Sq F value Pr(>F)   
    TREATMENT  1    248   248.3    17.2 0.0043 **
    Residuals  7    101    14.5                  
    ---
    Signif. codes:  
    0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Source of variationdfMean SqF-ratioP-value
    LEAF (block)   
    TREAT
    Residuals   
  5. Interprete the results.
  6. Now analyse these data with a linear mixed effects model (REML).
    Show lme code
    library(nlme)
    tobacco.lme <- lme(NUMBER~TREATMENT, random=~1|LEAF, data=tobacco, method='ML')
    summary(tobacco.lme)
    
    Linear mixed-effects model fit by maximum likelihood
     Data: tobacco 
        AIC   BIC logLik
      102.5 105.6 -47.25
    
    Random effects:
     Formula: ~1 | LEAF
            (Intercept) Residual
    StdDev:       3.454    3.557
    
    Fixed effects: NUMBER ~ TREATMENT 
                  Value Std.Error DF t-value p-value
    (Intercept)   34.94     1.874  7  18.645  0.0000
    TREATMENTWeak -7.88     1.901  7  -4.144  0.0043
     Correlation: 
                  (Intr)
    TREATMENTWeak -0.507
    
    Standardized Within-Group Residuals:
         Min       Q1      Med       Q3      Max 
    -1.73025 -0.51798  0.01212  0.43724  1.52636 
    
    Number of Observations: 16
    Number of Groups: 8 
    
    #unique(confint(tobacco.lme,'TREATMENTWeak')[[1]])
    library(gmodels)
    ci(tobacco.lme)
    
                  Estimate CI lower CI upper
    (Intercept)     34.940    30.51   39.371
    TREATMENTWeak   -7.879   -12.38   -3.383
                  Std. Error DF   p-value
    (Intercept)        1.874  7 3.169e-07
    TREATMENTWeak      1.901  7 4.328e-03
    
    anova(tobacco.lme)
    
                numDF denDF F-value p-value
    (Intercept)     1     7   368.5  <.0001
    TREATMENT       1     7    17.2  0.0043
    
    Show lmer code
    library(lme4)
    tobacco.lmer <- lmer(NUMBER~TREATMENT +(1|LEAF), data=tobacco)
    summary(tobacco.lmer)
    
    Linear mixed model fit by REML ['lmerMod']
    Formula: NUMBER ~ TREATMENT + (1 | LEAF)
       Data: tobacco
    
    REML criterion at convergence: 88.7
    
    Scaled residuals: 
        Min      1Q  Median      3Q     Max 
    -1.6185 -0.4845  0.0113  0.4090  1.4278 
    
    Random effects:
     Groups   Name        Variance Std.Dev.
     LEAF     (Intercept) 13.6     3.69    
     Residual             14.5     3.80    
    Number of obs: 16, groups:  LEAF, 8
    
    Fixed effects:
                  Estimate Std. Error t value
    (Intercept)      34.94       1.87   18.64
    TREATMENTWeak    -7.88       1.90   -4.14
    
    Correlation of Fixed Effects:
                (Intr)
    TREATMENTWk -0.507
    
    anova(tobacco.lmer)
    
    Analysis of Variance Table
              Df Sum Sq Mean Sq F value
    TREATMENT  1    248     248    17.2
    
    # Perform SAS-like p-value calculations (requires the lmerTest and pbkrtest package)
    library(lmerTest)
    tobacco.lmer <- lmer(NUMBER~TREATMENT +(1|LEAF), data=tobacco)
    summary(tobacco.lmer)
    
    Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [
    merModLmerTest]
    Formula: NUMBER ~ TREATMENT + (1 | LEAF)
       Data: tobacco
    
    REML criterion at convergence: 88.7
    
    Scaled residuals: 
        Min      1Q  Median      3Q     Max 
    -1.6185 -0.4845  0.0113  0.4090  1.4278 
    
    Random effects:
     Groups   Name        Variance Std.Dev.
     LEAF     (Intercept) 13.6     3.69    
     Residual             14.5     3.80    
    Number of obs: 16, groups:  LEAF, 8
    
    Fixed effects:
                  Estimate Std. Error    df t value Pr(>|t|)    
    (Intercept)      34.94       1.87 11.33   18.64  7.4e-10 ***
    TREATMENTWeak    -7.88       1.90  7.00   -4.14   0.0043 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Correlation of Fixed Effects:
                (Intr)
    TREATMENTWk -0.507
    
    anova(tobacco.lmer) # Satterthwaite denominator df method
    
    Analysis of Variance Table of type 3  with  Satterthwaite 
    approximation for degrees of freedom
              Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)   
    TREATMENT    248     248     1     7    17.2 0.0043 **
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    anova(tobacco.lmer,ddf="Kenward-Roger") # Kenward-Roger denominator df method
    
    Analysis of Variance Table of type 3  with  Kenward-Roger 
    approximation for degrees of freedom
              Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)   
    TREATMENT    248     248     1     7    17.2 0.0043 **
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    #Examine the effect season via a likelihood ratio test
    tobacco.lmer1 <- lmer(NUMBER~TREATMENT +(1|LEAF), data=tobacco, REML=F)
    tobacco.lmer2 <- lmer(NUMBER~1 +(1|LEAF), data=tobacco, REML=F)
    anova(tobacco.lmer1,tobacco.lmer2, test="F")
    
    Data: tobacco
    Models:
    ..1: NUMBER ~ 1 + (1 | LEAF)
    object: NUMBER ~ TREATMENT + (1 | LEAF)
           Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)   
    ..1     3 110 113  -52.2    104.5                           
    object  4 102 106  -47.2     94.5  9.98      1     0.0016 **
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Show ezANOVA code
    library(ez)
    ezANOVA(dv=.(NUMBER),within=.(TREATMENT),wid=.(LEAF), data=tobacco,type=3)
    
    $ANOVA
         Effect DFn DFd     F        p p<.05   ges
    2 TREATMENT   1   7 17.17 0.004328     * 0.387
    
  7. Generate a summary figure with predicted mean number of lesions (and confidence interval) for each treatment.
    Show code
    newdata <- expand.grid(TREATMENT=c("STRONG","WEAK"), NUMBER=0)
    Xmat <- model.matrix(terms(tobacco.lmer),newdata)
    Xmat
    
      (Intercept) TREATMENTWEAK
    1           1             0
    2           1             1
    attr(,"assign")
    [1] 0 1
    attr(,"contrasts")
    attr(,"contrasts")$TREATMENT
    [1] "contr.treatment"
    
    pred <- as.vector(fixef(tobacco.lmer) %*% t(Xmat))
    se <- sqrt(diag(Xmat %*% vcov(tobacco.lmer) %*% t(Xmat)))
    newdata$fit <- pred
    newdata$lower <- pred - 2*se
    newdata$upper <- pred + 2*se
    
    library(ggplot2)
    library(grid)
    ggplot(newdata, aes(y=fit, x=TREATMENT)) +
      geom_errorbar(aes(ymin=lower, ymax=upper), width=0.01)+
      geom_point() +
      scale_x_discrete('Inoculation treatment')+
      scale_y_continuous('Number of lesions')+
      theme_classic() +
      theme(axis.title.y=element_text(vjust=2,size=rel(1.2)),
            axis.title.x=element_text(vjust=-2,size=rel(1.2)),
            plot.margin=unit(c(0.5,0.5,2,2),'lines'))
    
    plot of chunk Q2-9
    ggplot(newdata, aes(y=fit, x=TREATMENT)) +
      geom_bar(stat='identity', fill='grey', color='black') +
      geom_errorbar(aes(ymin=lower, ymax=upper), width=0.01)+
      scale_x_discrete('Inoculation treatment')+
      scale_y_continuous('Number of lesions', expand=c(0,0))+
      theme_classic() +
      theme(axis.title.y=element_text(vjust=2,size=rel(1.2)),
            axis.title.x=element_text(vjust=-2,size=rel(1.2)),
            plot.margin=unit(c(0.5,0.5,2,2),'lines'))
    
    plot of chunk Q2-9

  Reading data into R

Ensure that the working directory is pointing to the path containing the file to be imported before proceeding.
To import data into R, we read the contents of a file into a data frame. The general format of the command for reading data into a data frame is

> name <- read.table('filename.csv', header=T, sep=',', row.names=column, strip.white=T)

where name is a name you wish the data frame to be referred to as, filename.csv is the name of the csv file that you created in excel and column is the number of the column that had row labels (if there were any). The argument header=T indicates that the variable (vector) names will be created from the names supplied in the first row of the file. The argument sep=',' indicates that entries in the file are separated by a comma (hence a comma delimited file). If the data file does not contain row labels, or you are not sure whether it does, it is best to omit the row.names=column argument. The strip.white=T arguement ensures that no leading or trailing spaces are left in character names (these can be particularly nasty in categorical variables!).

As an example