Preparations

Load the necessary libraries

library(tidyverse)

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats

Scenario

A plant pathologist wanted to examine the effects of two different strengths of tobacco virus on the number of lesions on tobacco leaves. She knew from pilot studies that leaves were inherently very variable in response to the virus. In an attempt to account for this leaf to leaf variability, both treatments were applied to each leaf. Eight individual leaves were divided in half, with half of each leaf inoculated with weak strength virus and the other half inoculated with strong virus. So the leaves were blocks and each treatment was represented once in each block. A completely randomised design would have had 16 leaves, with 8 whole leaves randomly allocated to each treatment.

Tobacco plant

Format of tobacco.csv data files

LEAF	TREAT	NUMBER
1	Strong	35.898
1	Week	25.02
2	Strong	34.118
2	Week	23.167
3	Strong	35.702
3	Week	24.122
...	...	...

LEAF	The blocking factor - Factor B
TREAT	Categorical representation of the strength of the tobacco virus - main factor of interest Factor A
NUMBER	Number of lesions on that part of the tobacco leaf - response variable

Read in the data

tobacco = read_csv('data/tobacco.csv', trim_ws=TRUE)

## Parsed with column specification:
## cols(
##   LEAF = col_character(),
##   TREATMENT = col_character(),
##   NUMBER = col_double()
## )

glimpse(tobacco)

## Observations: 16
## Variables: 3
## $ LEAF      <chr> "L1", "L1", "L2", "L2", "L3", "L3", "L4", "L4", "L5"...
## $ TREATMENT <chr> "Strong", "Weak", "Strong", "Weak", "Strong", "Weak"...
## $ NUMBER    <dbl> 35.89776, 25.01984, 34.11786, 23.16740, 35.70215, 24...

Exploratory data analysis

Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i =\boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i} \]

where \(\boldsymbol{\beta}\) and \(\boldsymbol{\gamma}\) are vectors of the fixed and random effects parameters respectively and \(\bf{X}\) is the model matrix representing the overall intercept and effects of the treatment on the number of lesions. \(\bf{Z}\) represents a cell means model matrix for the random intercepts associated with leaves.

GLMM Part 1

Murray Logan

16 March 2019

Preparations

Scenario

Read in the data

Exploratory data analysis

Fit the model

Model validation

Model investigation / hypothesis testing

Predictions

Summary figures

References