Workshop 9.7a - Nested designs (mixed effects models)

4 August 2011

Topics

Nested ANOVA
- Exercise 1

Basic statistics references

Logan (2010) - Chpt 12-14
Quinn & Keough (2002) - Chpt 9-11

Very basic overview of Nested ANOVA

Very basic overview of Randomized block

Nested ANOVA - one between factor

In an unusually detailed preparation for an Environmental Effects Statement for a proposed discharge of dairy wastes into the Curdies River, in western Victoria, a team of stream ecologists wanted to describe the basic patterns of variation in a stream invertebrate thought to be sensitive to nutrient enrichment. As an indicator species, they focused on a small flatworm, Dugesia, and started by sampling populations of this worm at a range of scales. They sampled in two seasons, representing different flow regimes of the river - Winter and Summer. Within each season, they sampled three randomly chosen (well, haphazardly, because sites are nearly always chosen to be close to road access) sites. A total of six sites in all were visited, 3 in each season. At each site, they sampled six stones, and counted the number of flatworms on each stone.

Download Curdies data set

Format of curdies.csv data files

SEASON	SITE	DUGESIA	S4DUGESIA
WINTER	1	0.648	0.897
..	..	..	..
WINTER	2	1.016	1.004
..	..	..	..
WINTER	3	0.689	0.991
..	..	..	..
SUMMER	4	0	0
..	..	..	..

Each row represents a different stone

SEASON	Season in which flatworms were counted - fixed factor
SITE	Site from where flatworms were counted - nested within SEASON (random factor)
DUGESIA	Number of flatworms counted on a particular stone
S4DUGESIA	4th root transformation of DUGESIA variable

The hierarchical nature of this design can be appreciated when we examine an illustration of the spatial and temporal scale of the measurements.

Within each of the two Seasons, there were three separate Sites (Sites not repeatidly measured across Seasons).
Within each of the Sites, six logs were selected (haphazardly) from which the number of flatworms were counted.

So the Logs (smallest sampling units) are the replicates for the Sites (six reps per Site) and the Site means are the replicates for the two Seasons (three replicates within each Season).

Open

the curdies data file.

Show code

curdies <- read.table("../downloads/data/curdies.csv", header = T, sep = ",", strip.white = T)
head(curdies)

##   SEASON SITE DUGESIA S4DUGES
## 1 WINTER    1  0.6477  0.8971
## 2 WINTER    1  6.0962  1.5713
## 3 WINTER    1  1.3106  1.0700
## 4 WINTER    1  1.7253  1.1461
## 5 WINTER    1  1.4594  1.0991
## 6 WINTER    1  1.0576  1.0141

The SITE variable is supposed to represent a random factorial variable (which site). However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor (HINT)

Show code

curdies$SITE <- as.factor(curdies$SITE)

Notice the data set - each of the nested factors is labelled differently - there can be no replicate for the random (nesting) factor.

Q1-1. What are the main hypotheses being tested?

H₀ Effect 1:
H₀ Effect 2:

Q1-2. In the table below, list the assumptions of nested ANOVA along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

Assumption	Diagnostic/Risk Minimization
I.
II.
III.

Q1-3. Check these assumptions (HINT).

Show code

library(nlme)
curdies.ag <- gsummary(curdies, form = ~SEASON/SITE, base:::mean)
# OR
library(plyr)
curdies.ag <- ddply(curdies, ~SEASON + SITE, numcolwise(mean))

Note that for the effects of SEASON (Factor A in a nested model) there are only three values for each of the two season types. Therefore, boxplots are of limited value! Is there however, any evidence of violations of the assumptions (HINT)?

Show code

boxplot(DUGESIA ~ SEASON, curdies.ag)

(Y or N)
If so, assess whether a transformation will address the violations (HINT) and then make the appropriate corrections

Show code

curdies$S4DUGES <- sqrt(sqrt(curdies$DUGESIA))
curdies.ag <- gsummary(curdies, form = ~SEASON/SITE, base:::mean)
boxplot(S4DUGES ~ SEASON, curdies.ag)

Q1-4. For each of the tests, state which error (or residual) term (state as Mean Square) will be used as the nominator and denominator to calculate the F ratio. Also include degrees of freedom associated with each term.

Effect	Nominator (Mean Sq, df)	Denominator (Mean Sq, df)
SEASON
SITE

Q1-5. If there is no evidence of violations, test the model;
S4DUGES = SEASON + SITE + CONSTANT
using a nested ANOVA

(HINT). Fill (HINT) out the table below, make sure that you have treated SITE as a random factor when compiling the overall results.

Show code

curdies.aov <- aov(S4DUGES ~ SEASON + Error(SITE), data = curdies)
summary(curdies.aov)

## 
## Error: SITE
##           Df Sum Sq Mean Sq F value Pr(>F)   
## SEASON     1   5.57    5.57    34.5 0.0042 **
## Residuals  4   0.65    0.16                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 30   4.56   0.152

summary(lm(curdies.aov))

## 
## Call:
## lm(formula = curdies.aov)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.381 -0.262 -0.138  0.165  0.902 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    0.3811     0.1591    2.40   0.0230 * 
## SEASONWINTER   0.7518     0.2250    3.34   0.0022 **
## SITE2          0.1389     0.2250    0.62   0.5416   
## SITE3         -0.2651     0.2250   -1.18   0.2480   
## SITE4         -0.0303     0.2250   -0.13   0.8938   
## SITE5         -0.2007     0.2250   -0.89   0.3795   
## SITE6              NA         NA      NA       NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.39 on 30 degrees of freedom
## Multiple R-squared:  0.577,	Adjusted R-squared:  0.507 
## F-statistic: 8.19 on 5 and 30 DF,  p-value: 5.72e-05

library(gmodels)
ci(lm(curdies.aov))

##              Estimate CI lower CI upper Std. Error  p-value
## (Intercept)    0.3811  0.05622   0.7060     0.1591 0.023031
## SEASONWINTER   0.7518  0.29235   1.2113     0.2250 0.002242
## SITE2          0.1389 -0.32055   0.5984     0.2250 0.541560
## SITE3         -0.2651 -0.72455   0.1944     0.2250 0.247983
## SITE4         -0.0303 -0.48978   0.4292     0.2250 0.893763
## SITE5         -0.2007 -0.66013   0.2588     0.2250 0.379548

Q1-6. For each of the tests, state which error (or residual) term (state as Mean Square) will be used as the nominator and denominator to calculate the F ratio. Also include degrees of freedom associated with each term.

Source of variation	df	Mean Sq	F-ratio	P-value
SEASON
SITE
Residuals

Estimate	Mean	Lower 95% CI	Upper 95% CI
Summer
Effect size (Winter-Summer)

Normally, we are not interested in formally testing the effect of the nested factor to get the correct F test for the nested factor (SITE), examine a representation of the anova table of the fitted linear model that assumes all factors are fixed (HINT)

Q1-7. What are your conclusions (statistical and biological)?

Q1-8. Now analyse these data using a linear mixed effects model (REML).

Show code

library(nlme)
curdies.lme <- lme(S4DUGES ~ SEASON, random = ~1 | SITE, data = curdies)
summary(curdies.lme)

## Linear mixed-effects model fit by REML
##  Data: curdies 
##     AIC   BIC logLik
##   46.43 52.54 -19.21
## 
## Random effects:
##  Formula: ~1 | SITE
##         (Intercept) Residual
## StdDev:     0.04009   0.3897
## 
## Fixed effects: S4DUGES ~ SEASON 
##               Value Std.Error DF t-value p-value
## (Intercept)  0.3041   0.09472 30   3.211  0.0031
## SEASONWINTER 0.7868   0.13395  4   5.873  0.0042
##  Correlation: 
##              (Intr)
## SEASONWINTER -0.707
## 
## Standardized Within-Group Residuals:
##     Min      Q1     Med      Q3     Max 
## -1.2065 -0.7681 -0.2481  0.3531  2.3209 
## 
## Number of Observations: 36
## Number of Groups: 6

# Calculate the confidence interval for the effect size of the main effect of season
intervals(curdies.lme)

## Approximate 95% confidence intervals
## 
##  Fixed effects:
##               lower   est.  upper
## (Intercept)  0.1107 0.3041 0.4976
## SEASONWINTER 0.4148 0.7868 1.1587
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: SITE 
##                    lower    est. upper
## sd((Intercept)) 1.88e-07 0.04009  8546
## 
##  Within-group standard error:
##  lower   est.  upper 
## 0.3026 0.3897 0.5019

# unique(confint(curdies.lme,'SEASONWINTER')[[1]]) Examine the ANOVA table with Type I (sequential)
# SS
anova(curdies.lme)

##             numDF denDF F-value p-value
## (Intercept)     1    30   108.5  <.0001
## SEASON          1     4    34.5  0.0042

Show code - lmer

library(lme4)

## Loading required package: Matrix

## Attaching package: 'Matrix'

## The following object is masked from 'package:reshape':
## 
## expand

## Attaching package: 'lme4'

## The following object is masked from 'package:coda':
## 
## HPDinterval

## The following object is masked from 'package:nlme':
## 
## lmList, VarCorr

## The following object is masked from 'package:stats':
## 
## AIC, BIC

curdies.lmer <- lmer(S4DUGES ~ SEASON + (1 | SITE), data = curdies)
summary(curdies.lmer)

## Linear mixed model fit by REML 
## Formula: S4DUGES ~ SEASON + (1 | SITE) 
##    Data: curdies 
##   AIC  BIC logLik deviance REMLdev
##  46.4 52.8  -19.2     32.5    38.4
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  SITE     (Intercept) 1.16e-13 3.41e-07
##  Residual             1.53e-01 3.91e-01
## Number of obs: 36, groups: SITE, 6
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)    0.3041     0.0922    3.30
## SEASONWINTER   0.7868     0.1304    6.03
## 
## Correlation of Fixed Effects:
##             (Intr)
## SEASONWINTE -0.707

# Calculate the confidence interval for the effect size of the main effect of season
# library(gmodels) ci(curdies.lmer)
library(languageR)

## Error: package 'languageR' was built before R 3.0.0: please re-install it

pvals.fnc(curdies.lmer, withMCMC = FALSE, addPlot = F)

## Error: could not find function "pvals.fnc"

# Examine the effect season via a likelihood ratio test
curdies.lmer1 <- lmer(S4DUGES ~ SEASON + (1 | SITE), data = curdies, REML = F)
curdies.lmer2 <- lmer(S4DUGES ~ 1 + (1 | SITE), data = curdies, REML = F)
anova(curdies.lmer1, curdies.lmer2, test = "F")

## Data: curdies
## Models:
## curdies.lmer2: S4DUGES ~ 1 + (1 | SITE)
## curdies.lmer1: S4DUGES ~ SEASON + (1 | SITE)
##               Df  AIC  BIC logLik Chisq Chi Df Pr(>Chisq)    
## curdies.lmer2  3 51.8 56.6  -22.9                            
## curdies.lmer1  4 40.5 46.9  -16.3  13.3      1    0.00026 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

LOOK into ezStats in ez package

Show code - ezStats

library(ez)

## Error: package 'ez' was built before R 3.0.0: please re-install it

ezStats(dv = .(S4DUGES), between = .(SEASON), wid = .(SITE), data = curdies)

## Error: could not find function "ezStats"

Show code - ezAnova

library(ez)

## Error: package 'ez' was built before R 3.0.0: please re-install it

ezANOVA(dv = .(S4DUGES), between = .(SEASON), wid = .(SITE), data = curdies, type = 3)

## Error: could not find function "ezANOVA"

Plot predicted mean square-root transformed number of Dugesia (and confidence interval) for each season

Show code - summary plots

library(gplots)

## Loading required package: gtools

## Loading required package: gdata

## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.

##

## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.

## Attaching package: 'gdata'

## The following object is masked from 'package:stats':
## 
## nobs

## The following object is masked from 'package:utils':
## 
## object.size

## Loading required package: caTools

## Loading required package: grid

## Loading required package: KernSmooth

## KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009

## Loading required package: MASS

## Attaching package: 'gplots'

## The following object is masked from 'package:stats':
## 
## lowess

library(AICcmodavg)
curdies.predict <- predictSE.mer(curdies.lmer, newdata = data.frame(SEASON = c("SUMMER", "WINTER")))

## Error: no applicable method for 'fixef' applied to an object of class "mer"

par(mar = c(5, 5, 1, 1))
plot(1:2, curdies.predict$fit, type = "n", axes = F, ann = F, ylim = range(0, 1.5), xlim = c(0.5, 2.5))

## Error: object 'curdies.predict' not found

plotCI(curdies.predict$fit, uiw = curdies.predict$se.fit * 2, las = 1, add = T, sfrac = 0.001)

## Error: object 'curdies.predict' not found

axis(1, at = c(1, 2), lab = levels(curdies$SEASON))

## Error: plot.new has not been called yet

axis(2, las = 1)

## Error: plot.new has not been called yet

mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)

## Error: plot.new has not been called yet

mtext("Season", 1, line = 3, cex = 1.25)

## Error: plot.new has not been called yet

box(bty = "l")

## Error: plot.new has not been called yet

# OR the more flexible way, but that requires more code and thought:
newdata <- expand.grid(SEASON = c("SUMMER", "WINTER"), S4DUGES = 0)
mod.mat <- model.matrix(terms(curdies.lmer), newdata)
newdata$S4DUGES = mod.mat %*% fixef(curdies.lmer)

## Error: no applicable method for 'fixef' applied to an object of class "mer"

pvar1 <- (diag(mod.mat %*% tcrossprod(vcov(curdies.lmer), mod.mat)))

## Error: no applicable method for 'vcov' applied to an object of class "mer"

tvar1 <- pvar1 + VarCorr(curdies.lmer)$SITE[1]

## Error: object 'pvar1' not found

newdata <- data.frame(newdata, plo = newdata$S4DUGES - 2 * sqrt(pvar1), phi = newdata$S4DUGES + 2 * sqrt(pvar1), 
    tlo = newdata$S4DUGES - 2 * sqrt(tvar1), thi = newdata$S4DUGES + 2 * sqrt(tvar1))

## Error: object 'pvar1' not found

par(mar = c(5, 5, 1, 1))
plot(S4DUGES ~ as.numeric(SEASON), newdata, type = "n", axes = F, ann = F, ylim = range(0, newdata$phi), 
    xlim = c(0.5, 2.5))
points(curdies$DUGESIA ~ jitter(as.numeric(curdies$SEASON)), pch = 16, col = "grey")
with(newdata, arrows(as.numeric(SEASON), plo, as.numeric(SEASON), phi, ang = 90, length = 0.05, code = 3))

## Error: object 'plo' not found

with(newdata, points(as.numeric(SEASON), S4DUGES, pch = 21, bg = "black", col = "black"))
axis(1, at = c(1, 2), lab = newdata$SEASON)
axis(2, las = 1)
mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)
mtext("Season", 1, line = 3, cex = 1.25)
box(bty = "l")

Plot predicted mean transformed number of Dugesia (and confidence interval) for each season. Note that these data were modelled with a Gaussian distribution (this might not have been appropriate)

Show code - summary diagram

# newdataTransformed <- expand.grid(SEASON=c('SUMMER','WINTER')) library(AICcmodavg)
# newdataTransformed <-
# data.frame(newdataTransformed,predictSE.mer(curdies.lmer,newdataTransformed, print.matrix=T)^4)
newdata[, -1] <- newdata[, -1]^4
par(mar = c(5, 5, 1, 1))
plot(S4DUGES ~ as.numeric(SEASON), newdata, type = "n", axes = F, ann = F, ylim = range(newdata$plo, 
    newdata$phi), xlim = c(0.5, 2.5))

## Warning: no non-missing arguments to min; returning Inf

## Warning: no non-missing arguments to max; returning -Inf

## Error: need finite 'ylim' values

points(curdies$DUGESIA ~ jitter(as.numeric(curdies$SEASON)), pch = 16, col = "grey")
with(newdata, arrows(as.numeric(SEASON), plo, as.numeric(SEASON), phi, ang = 90, length = 0.05, code = 3))

## Error: object 'plo' not found

with(newdata, points(as.numeric(SEASON), S4DUGES, pch = 21, bg = "black", col = "black"))
# plot(fit~as.numeric(SEASON), newdata, type='n', axes=F, ann=F, ylim=range(0,2),xlim=c(0.5,2.5))
# with(newdata,arrows(as.numeric(SEASON),fit-2*se.fit,as.numeric(SEASON),fit+2*se.fit, ang=90,
# length=0.05, code=3)) with(newdata, points(as.numeric(SEASON),fit, pch=21,
# bg='black',col='black'))
axis(1, at = c(1, 2), lab = newdata$SEASON)
axis(2, las = 1)
mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)
mtext("Season", 1, line = 3, cex = 1.25)
box(bty = "l")

Note, backtransforming the standard deviations has the reverse effect to desired (since they are less than 1).

Show code

library(gplots)
library(AICcmodavg)

curdies.predict <- predictSE.mer(curdies.lmer, newdata = data.frame(SEASON = c("SUMMER", "WINTER")))

## Error: no applicable method for 'fixef' applied to an object of class "mer"

curdies.predict$fit <- curdies.predict$fit^2

## Error: object 'curdies.predict' not found

curdies.predict$se.fit <- curdies.predict$se.fit^2

## Error: object 'curdies.predict' not found

par(mar = c(5, 5, 1, 1))
plot(1:2, curdies.predict$fit, type = "n", axes = F, ann = F, ylim = range(0, 1.5), xlim = c(0.5, 2.5))

## Error: object 'curdies.predict' not found

plotCI(curdies.predict$fit, uiw = curdies.predict$se.fit * 2, las = 1, add = T, sfrac = 5e-04)

## Error: object 'curdies.predict' not found

axis(1, at = c(1, 2), lab = levels(curdies$SEASON))

## Error: plot.new has not been called yet

axis(2, las = 1)

## Error: plot.new has not been called yet

mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)

## Error: plot.new has not been called yet

mtext("Season", 1, line = 3, cex = 1.25)

## Error: plot.new has not been called yet

box(bty = "l")

## Error: plot.new has not been called yet

Q1-9. Where is the major variation in numbers of flatworms? Between (seasons, sites or stones)?

Show code

library(nlme)
nlme:::VarCorr(lme(S4DUGES ~ 1, random = ~1 | SEASON/SITE, curdies))

##             Variance     StdDev 
## SEASON =    pdLogChol(1)        
## (Intercept) 0.300523     0.54820
## SITE =      pdLogChol(1)        
## (Intercept) 0.001607     0.04009
## Residual    0.151851     0.38968

Q1-10. How might this information influence the design of future experiments on Dugesia in terms of:

What influences the abundance of Dugesia
Where best to focus sampling effort to maximize statistical power?

Q1-11.Finally, construct an appropriate summary figure to accompany the above analyses. Note that this should use the correct replicates for depicting error.

Show code

opar <- par(mar = c(5, 5, 1, 1))
means <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, mean)
sds <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, sd)
lens <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, length)
ses <- sds/sqrt(lens)
xs <- barplot(means, beside = T, ann = F, axes = F, ylim = c(0, max(means + ses)), axisnames = F)
arrows(xs, means - ses, xs, means + ses, code = 3, length = 0.05, ang = 90)
axis(1, at = xs, lab = c("Summer", "Winter"))
mtext("Season", 1, line = 3, cex = 1.25)
axis(2, las = 1)
mtext("Mean number of Dugesia per stone", 2, line = 3, cex = 1.25)
box(bty = "l")

Source	A,C (Fix); B (Rand)
A	MS_A/MS_B
B	MS_B/MS_Residual
C	MS_C/MS_B:C
A:C	MS_A:C/MS_B:C
B:C	MS_B:C/MS_Residual

Source

A,C (Fix); B (Rand)

MS_A/MS_B

MS_B/MS_Residual

MS_C/MS_B:C

A:C

MS_A:C/MS_B:C

B:C

MS_B:C/MS_Residual