3 |
0 |
2.5 |
1 |
6 |
2 |
5.5 |
3 |
9 |
4 |
8.6 |
5 |
12 |
6 |
\[
\begin{alignat*}{4}
3.0\quad&=\quad&\beta_0\times 1\quad&+\quad&\beta_1\times 0\quad&+\quad&\varepsilon_1\\
2.5\quad&=\quad&\beta_0\times 1\quad&+\quad&\beta_1\times 1\quad&+\quad&\varepsilon_1\\
6.0\quad&=\quad&\beta_0\times 1\quad&+\quad&\beta_1\times 2\quad&+\quad&\varepsilon_2\\
5.5\quad&=\quad&\beta_0\times 1\quad&+\quad&\beta_1\times 3\quad&+\quad&\varepsilon_3\\
9.0\quad&=\quad&\beta_0\times 1\quad&+\quad&\beta_1\times 4\quad&+\quad&\varepsilon_4\\
8.6\quad&=\quad&\beta_0\times 1\quad&+\quad&\beta_1\times 5\quad&+\quad&\varepsilon_5\\
12.0\quad&=\quad&\beta_0\times 1\quad&+\quad&\beta_1\times 6\quad&+\quad&\varepsilon_6\\
\end{alignat*}
\]
|
\[\underbrace{\begin{pmatrix}
3.0\\2.5\\6.0\\5.5\\9.0\\8.6\\12.0
\end{pmatrix}}_\text{Response values} =
\underbrace{\begin{pmatrix}
1&0\\1&1\\1&2\\1&3\\1&4\\1&5\\1&6
\end{pmatrix}}_\text{Model matrix} \times
\underbrace{\begin{pmatrix}
\beta_0\\\beta_1
\end{pmatrix}}_\text{Parameter vector} +
\underbrace{\begin{pmatrix}
\varepsilon_1\\\varepsilon_2\\\varepsilon_3\\\varepsilon_4&\\\varepsilon_5\\\varepsilon_6
\end{pmatrix}}_\text{Residual vector}
\]
Matrix algebra
\[\underbrace{\begin{pmatrix}
3.0\\2.5\\6.0\\5.5\\9.0\\8.6\\12.0
\end{pmatrix}}_\text{Response values} =
\underbrace{\begin{pmatrix}
1&0\\1&1\\1&2\\1&3\\1&4\\1&5\\1&6
\end{pmatrix}}_\text{Model matrix} \times
\underbrace{\begin{pmatrix}
\beta_0\\\beta_1
\end{pmatrix}}_\text{Parameter vector} +
\underbrace{\begin{pmatrix}
\varepsilon_1\\\varepsilon_2\\\varepsilon_3\\\varepsilon_4&\\\varepsilon_5\\\varepsilon_6
\end{pmatrix}}_\text{Residual vector}
\]
\[
Y = X\beta+\epsilon
\]
\[
\hat\beta = (X'X)^{-1}X'Y
\]
Simple ANOVA
Three treatments (One factor - 3 levels), three replicates
Simple ANOVA
Two treatments, three replicates
Categorical predictor
Y A dummy1 dummy2 dummy3
--- --- -------- -------- --------
2 G1 1 0 0
3 G1 1 0 0
4 G1 1 0 0
6 G2 0 1 0
7 G2 0 1 0
8 G2 0 1 0
10 G3 0 0 1
11 G3 0 0 1
12 G3 0 0 1
\[y_{ij}=\mu+\beta_1(dummy_1)_{ij} + \beta_2(dummy_2)_{ij} +\beta_3(dummy_3)_{ij} + \varepsilon_{ij}\]
Overparameterized
\[y_{ij}=\color{darkorange}{\mu}+\color{darkorange}{\beta_1}(dummy_1)_{ij} + \color{darkorange}{\beta_2}(dummy_2)_{ij} +\color{darkorange}{\beta_3}(dummy_3)_{ij} + \varepsilon_{ij}\]
Y A Intercept dummy1 dummy2 dummy3
--- --- ----------- -------- -------- --------
2 G1 1 1 0 0
3 G1 1 1 0 0
4 G1 1 1 0 0
6 G2 1 0 1 0
7 G2 1 0 1 0
8 G2 1 0 1 0
10 G3 1 0 0 1
11 G3 1 0 0 1
12 G3 1 0 0 1
Overparameterized
\[y_{ij}=\color{darkorange}{\mu}+\color{darkorange}{\beta_1}(dummy_1)_{ij} + \color{darkorange}{\beta_2}(dummy_2)_{ij} +\color{darkorange}{\beta_3}(dummy_3)_{ij} + \varepsilon_{ij}\]
- three treatment groups
- four parameters to estimate
- need to re-parameterize
Categorical predictor
\[y_{i}=\mu+\beta_1(dummy_1)_{i} + \beta_2(dummy_2)_{i} +\beta_3(dummy_3)_{i} + \varepsilon_{i}\]
Effects parameterization
\[y_{ij}=\mu+\beta_2(dummy_2)_{ij} + \beta_3(dummy_3)_{ij} + \varepsilon_{ij}\] \[y_{ij}=\mu + \alpha_i + \varepsilon_{ij} \qquad \mathsf{i=p-1}\]
Categorical predictor
Means parameterization
\[y_{i}=\alpha + \beta_2(dummy_2)_{i} +\beta_3(dummy_3)_{i} + \varepsilon_{i}\]
Y A alpha dummy2 dummy3
--- --- ------- -------- --------
2 G1 1 0 0
3 G1 1 0 0
4 G1 1 0 0
6 G2 1 1 0
7 G2 1 1 0
8 G2 1 1 0
10 G3 1 0 1
11 G3 1 0 1
12 G3 1 0 1
Categorical predictor
Effects parameterization
Raw data
|
|
Matrix representation
|
y
|
A
|
2
|
G1
|
3
|
G1
|
4
|
G1
|
6
|
G2
|
7
|
G2
|
8
|
G2
|
10
|
G3
|
11
|
G3
|
12
|
G3
|
|
|
\[y_i = \mu + \alpha_p+\varepsilon_i, \\ \text{where}~p= \text{number of levels minus 1}\] \[\begin{pmatrix}
2\\3\\4\\6\\7\\8\\10\\11\\12
\end{pmatrix} =
\begin{pmatrix}
1&0&0\\1&0&0\\1&0&0\\1&1&0\\1&1&0\\1&1&0\\1&0&1\\1&0&1\\1&0&1
\end{pmatrix} \times
\begin{pmatrix}
\mu\\\alpha_2\\\alpha_3
\end{pmatrix} +
\begin{pmatrix}
\varepsilon_1\\\varepsilon_2\\\varepsilon_3\\\varepsilon_4&\\\varepsilon_5\\\varepsilon_6\\\varepsilon_7&\\\varepsilon_8\\\varepsilon_9
\end{pmatrix}
\]
|
\[\alpha_1 = 3, \alpha_2 = 4, \alpha_3 = 8\]
Categorical predictor
Treatment contrasts
Parameter
|
Estimates
|
Null hypothesis
|
\(Intercept\)
|
mean of control group (\(\mu_1\))
|
H\(_0\): \(\mu=\mu_1=0\)
|
\(\alpha^*_2\)
|
mean of group 2 minus mean of control group (\(\mu_2-\mu_1\))
|
H\(_0\): \(\alpha^*_2=\mu_2-\mu_1=0\)
|
\(\alpha^*_3\)
|
mean of group 3 minus mean of control group (\(\mu_3-\mu_1\))
|
H\(_0\): \(\alpha^*_3=\mu_3-\mu_1=0\)
|
…
| |
|
> contrasts(A) <-contr.treatment
> contrasts(A)
2 3
G1 0 0
G2 1 0
G3 0 1
> summary(lm(Y~A))$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3 0.5773503 5.196152 2.022368e-03
A2 4 0.8164966 4.898979 2.713682e-03
A3 8 0.8164966 9.797959 6.506149e-05
Categorical predictor
Treatment contrasts
Parameter
|
Estimates
|
Null hypothesis
|
\(Intercept\)
|
mean of control group (\(\mu_1\))
|
H\(_0\): \(\mu=\mu_1=0\)
|
\(\alpha^*_2\)
|
mean of group 2 minus mean of control group (\(\mu_2-\mu_1\))
|
H\(_0\): \(\alpha^*_2=\mu_2-\mu_1=0\)
|
\(\alpha^*_3\)
|
mean of group 3 minus mean of control group (\(\mu_3-\mu_1\))
|
H\(_0\): \(\alpha^*_3=\mu_3-\mu_1=0\)
|
…
| |
|
> summary(lm(Y~A))$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3 0.5773503 5.196152 2.022368e-03
A2 4 0.8164966 4.898979 2.713682e-03
A3 8 0.8164966 9.797959 6.506149e-05
Assumptions
- Normality
- Homogeneity of variance
- Independence
The model
2 |
G1 |
1 |
0 |
0 |
3 |
G1 |
1 |
0 |
0 |
4 |
G1 |
1 |
0 |
0 |
6 |
G2 |
0 |
1 |
0 |
7 |
G2 |
0 |
1 |
0 |
8 |
G2 |
0 |
1 |
0 |
10 |
G3 |
0 |
0 |
1 |
11 |
G3 |
0 |
0 |
1 |
12 |
G3 |
0 |
0 |
1 |
\[
\begin{align}
y_{ij} &\sim{} N({\mu_{ij}}, \color{darkorange}{\sigma^2})\\
\mu_{ij} &= \color{darkorange}{\beta_0} + \color{darkorange}{\beta_{j}} X_{ij}\\
\end{align}
\]
|
Priors: \[
\begin{align}
\beta_0 &\sim{} N(0,\sigma_{Int}^2)\\
\beta_j &\sim{} N(0,\sigma_{Treat}^2)\\
\sigma^2 &\sim{} cauchy(0,4)\\
\end{align}
\]
Worked Examples
Format of day.csv data files
|
TREAT
|
BARNACLE
|
ALG1
|
27
|
..
|
..
|
ALG2
|
24
|
..
|
..
|
NB
|
9
|
..
|
..
|
S
|
12
|
..
|
..
|
TREAT
|
Categorical listing of surface types. ALG1 = algal species 1, ALG2 = algal species 2, NB = naturally bare surface, S = scraped bare surface.
|
BARNACLE
|
The number of newly recruited barnacles on each plot after 4 weeks.
|
|
|
> day <- read.csv('../data/day.csv', strip.white=T)
> head(day)
TREAT BARNACLE
1 ALG1 27
2 ALG1 19
3 ALG1 18
4 ALG1 23
5 ALG1 25
6 ALG2 24
Worked Examples
Question: what effects do different substrate types have on barnacle recruitment
Linear model:
\[
\begin{align}
Barnacle_{ij} &\sim{} N({\mu_{ij}}, {\sigma^2})\\
\mu_{ij} &= {\beta_0} + {\beta_{j}} Treat_{ij}\\[2em]
\beta_0 &\sim{} N(0,\sigma_{Int}^2)\\
\beta_j &\sim{} N(0,\sigma_{Treat}^2)\\
\sigma^2 &\sim{} cauchy(0,4)\\
\end{align}
\]
| |