Workshop 9.8a - Randomized block and repeated measures designs (mixed effects models)

4 August 2011

Topics

Randomized block
- Exercise 2

Basic statistics references

Logan (2010) - Chpt 12-14
Quinn & Keough (2002) - Chpt 9-11

Very basic overview of Nested ANOVA

Very basic overview of Randomized block

Randomized block

A plant pathologist wanted to examine the effects of two different strengths of tobacco virus on the number of lesions on tobacco leaves. She knew from pilot studies that leaves were inherently very variable in response to the virus. In an attempt to account for this leaf to leaf variability, both treatments were applied to each leaf. Eight individual leaves were divided in half, with half of each leaf inoculated with weak strength virus and the other half inoculated with strong virus. So the leaves were blocks and each treatment was represented once in each block. A completely randomised design would have had 16 leaves, with 8 whole leaves randomly allocated to each treatment.

Download Tobacco data set

Format of tobacco.csv data files

LEAF	TREAT	NUMBER
1	Strong	35.898
1	Week	25.02
2	Strong	34.118
2	Week	23.167
3	Strong	35.702
3	Week	24.122
...	...	...

LEAF	The blocking factor - Factor B
TREAT	Categorical representation of the strength of the tobacco virus - main factor of interest Factor A
NUMBER	Number of lesions on that part of the tobacco leaf - response variable

Open the tobacco data file.

Show code

> tobacco <- read.table("../downloads/data/tobacco.csv", header = T,
+     sep = ",", strip.white = T)
> head(tobacco)

  LEAF TREATMENT NUMBER
1   L1    Strong  35.90
2   L1      Weak  25.02
3   L2    Strong  34.12
4   L2      Weak  23.17
5   L3    Strong  35.70
6   L3      Weak  24.12

Since each level of treatment was applied to each leaf, the data represents a randomized block design with leaves as blocks.

The variable LEAF contains a list of Leaf identification numbers and is supposed to represent a factorial blocking variable. However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor

(HINT).

Show code

> tobacco$LEAF <- as.factor(tobacco$LEAF)

Q2-1. What are the main hypotheses being tested?

H₀ Factor A:
H₀ Factor B:

Q2-2. In the table below, list the assumptions of a randomized block design along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

Assumption Diagnostic/Risk Minimization
I.

II.

III.

IV.
Is the proposed model balanced?
(Yor N)

Assumption	Diagnostic/Risk Minimization
I.
II.
III.
IV.

Show code

> replications(NUMBER ~ LEAF + TREATMENT, data = tobacco)

     LEAF TREATMENT 
        2         8

> !is.list(replications(NUMBER ~ LEAF + TREATMENT, data = tobacco))

[1] TRUE

Q2-3. There are a number of ways of diagnosing block by within block interactions.

The simplest method is to plot the number of lesions for each treatment and leaf combination (ie. an interaction plot
(HINT). Any evidence of an interaction? Note that we cannot formally test for an interaction because we don't have replicates for each treatment-leaf combination!

Show code

> library(car)
> interaction.plot(tobacco$LEAF, tobacco$TREAT, tobacco$NUMBER)

We can also examine the residual plot from the fitted linear model. A curvilinear pattern in which the residual values switch from positive to negative and back again (or visa versa) over the range of predicted values implies that the scale (magnitude but not polarity) of the main treatment effects differs substantially across the range of blocks. (HINT). Any evidence of an interaction?

Show code

> residualPlots(lm(NUMBER ~ LEAF + TREATMENT, data = tobacco))

           Test stat Pr(>|t|)
LEAF              NA       NA
TREATMENT         NA       NA
Tukey test     -0.18    0.858

> residualPlots(lm(NUMBER ~ LEAF + TREATMENT, data = tobacco))

           Test stat Pr(>|t|)
LEAF              NA       NA
TREATMENT         NA       NA
Tukey test     -0.18    0.858

Perform a Tukey's test for non-additivity evaluated at α = 0.10 or even &alpha = 0.25. This (curvilinear test) formally tests for the presence of a quadratic trend in the relationship between residuals and predicted values. As such, it too is only appropriate for simple interactions of scale. (HINT). Any evidence of an interaction?

Show code

> library(alr3)
> tukey.nonadd.test(lm(NUMBER ~ LEAF + TREATMENT, data = tobacco))

   Test  Pvalue 
-0.1795  0.8575

Examine a lattice graphic to determine the consistency of the effect across each of the blocks(HINT).

Show code

> library(lattice)
> print(barchart(NUMBER ~ TREATMENT | LEAF, data = tobacco))

Q2-4. Analyse these data with a classic randomized block ANOVA

(HINT) to test the H₀ that there is no difference in the number of lesions produced by the different strength viruses. Complete the table below.

Show code

> tobacco.aov <- aov(NUMBER ~ TREATMENT + Error(LEAF), data = tobacco)
> summary(tobacco.aov)

Error: LEAF
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  7    292    41.7               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)   
TREATMENT  1    248   248.3    17.2 0.0043 **
Residuals  7    101    14.5                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Source of variation	df	Mean Sq	F-ratio	P-value
LEAF (block)
TREAT
Residuals

Q2-5. Interprete the results.

Q2-6. Now analyse these data with a linear mixed effects model (REML).

Show code

> library(nlme)
> tobacco.lme <- lme(NUMBER ~ TREATMENT, random = ~1 | LEAF, data = tobacco,
+     method = "ML")
> summary(tobacco.lme)

Linear mixed-effects model fit by maximum likelihood
 Data: tobacco 
    AIC   BIC logLik
  102.5 105.6 -47.25

Random effects:
 Formula: ~1 | LEAF
        (Intercept) Residual
StdDev:       3.454    3.557

Fixed effects: NUMBER ~ TREATMENT 
              Value Std.Error DF t-value p-value
(Intercept)   34.94     1.874  7  18.645  0.0000
TREATMENTWeak -7.88     1.901  7  -4.144  0.0043
 Correlation: 
              (Intr)
TREATMENTWeak -0.507

Standardized Within-Group Residuals:
     Min       Q1      Med       Q3      Max 
-1.73025 -0.51798  0.01212  0.43724  1.52636 

Number of Observations: 16
Number of Groups: 8

> #unique(confint(tobacco.lme,'TREATMENTWeak')[[1]])
> library(gmodels)
> ci(tobacco.lme)

              Estimate CI lower CI upper Std. Error DF   p-value
(Intercept)     34.940    30.51   39.371      1.874  7 3.169e-07
TREATMENTWeak   -7.879   -12.38   -3.383      1.901  7 4.328e-03

> anova(tobacco.lme)

            numDF denDF F-value p-value
(Intercept)     1     7   368.5  <.0001
TREATMENT       1     7    17.2  0.0043

Show code

> library(lme4)
> tobacco.lmer <- lmer(NUMBER ~ TREATMENT + (1 | LEAF), data = tobacco)
> summary(tobacco.lmer)

Linear mixed model fit by REML 
Formula: NUMBER ~ TREATMENT + (1 | LEAF) 
   Data: tobacco 
  AIC  BIC logLik deviance REMLdev
 96.7 99.8  -44.4     94.5    88.7
Random effects:
 Groups   Name        Variance Std.Dev.
 LEAF     (Intercept) 13.6     3.69    
 Residual             14.5     3.80    
Number of obs: 16, groups: LEAF, 8

Fixed effects:
              Estimate Std. Error t value
(Intercept)      34.94       1.87   18.65
TREATMENTWeak    -7.88       1.90   -4.14

Correlation of Fixed Effects:
            (Intr)
TREATMENTWk -0.507

> #Calculate the confidence interval for the effect size of the main effect
> #   of season
> library(languageR)
> pvals.fnc(tobacco.lmer, withMCMC = FALSE, addPlot = F)

$fixed
              Estimate MCMCmean HPD95lower HPD95upper  pMCMC Pr(>|t|)
(Intercept)     34.940   34.919      30.83     38.740 0.0001    0.000
TREATMENTWeak   -7.879   -7.841     -12.86     -2.231 0.0092    0.001

$random
    Groups        Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1     LEAF (Intercept)    3.692      1.243    1.440       0.00      3.917
2 Residual                3.803      5.038    5.189       3.08      7.531

> #Examine the effect season via a likelihood ratio test
> tobacco.lmer1 <- lmer(NUMBER ~ TREATMENT + (1 | LEAF), data = tobacco,
+     REML = F)
> tobacco.lmer2 <- lmer(NUMBER ~ 1 + (1 | LEAF), data = tobacco, REML = F)
> anova(tobacco.lmer1, tobacco.lmer2, test = "F")

Data: tobacco
Models:
tobacco.lmer2: NUMBER ~ 1 + (1 | LEAF)
tobacco.lmer1: NUMBER ~ TREATMENT + (1 | LEAF)
              Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)   
tobacco.lmer2  3 110 113  -52.2                           
tobacco.lmer1  4 102 106  -47.2  9.98      1     0.0016 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show code

> library(ez)
> ezANOVA(dv = .(NUMBER), within = .(TREATMENT), wid = .(LEAF), data = tobacco,
+     type = 3)

$ANOVA
     Effect DFn DFd     F        p p<.05   ges
2 TREATMENT   1   7 17.17 0.004328     * 0.387

Plot predicted mean square-root transformed number of Dugesia (and confidence interval) for each season

Show code

> newdata <- expand.grid(TREATMENT = c("STRONG", "WEAK"), NUMBER = 0)
> mod.mat <- model.matrix(terms(tobacco.lmer), newdata)
> newdata$NUMBER = mod.mat %*% fixef(tobacco.lmer)
> pvar1 <- (diag(mod.mat %*% tcrossprod(vcov(tobacco.lmer), mod.mat)))
> tvar1 <- pvar1 + VarCorr(tobacco.lmer)$LEAF[1]
> newdata <- data.frame(newdata, plo = newdata$NUMBER - 2 * sqrt(pvar1),
+     phi = newdata$NUMBER + 2 * sqrt(pvar1), tlo = newdata$NUMBER - 2 * sqrt(tvar1),
+     thi = newdata$NUMBER + 2 * sqrt(tvar1))
> par(mar = c(5, 5, 1, 1))
> plot(NUMBER ~ as.numeric(TREATMENT), newdata, type = "n", axes = F,
+     ann = F, ylim = range(newdata$plo, newdata$phi), xlim = c(0.5, 2.5))
> with(newdata, arrows(as.numeric(TREATMENT), plo, as.numeric(TREATMENT),
+     phi, ang = 90, length = 0.05, code = 3))
> with(newdata, points(as.numeric(TREATMENT), NUMBER, pch = 21, bg = "black",
+     col = "black"))
> axis(1, at = c(1, 2), lab = newdata$TREATMENT)
> axis(2, las = 1)
> mtext("Number of lesions", 2, line = 3, cex = 1.25)
> mtext("Season", 1, line = 3, cex = 1.25)
> box(bty = "l")

Source	A,C (Fix); B (Rand)
A	MS_A/MS_B
B	MS_B/MS_Residual
C	MS_C/MS_B:C
A:C	MS_A:C/MS_B:C
B:C	MS_B:C/MS_Residual

Source

A,C (Fix); B (Rand)

MS_A/MS_B

MS_B/MS_Residual

MS_C/MS_B:C

A:C

MS_A:C/MS_B:C

B:C

MS_B:C/MS_Residual