In Workshop 14.2 we introduced a dataset of Gittens(1985) in which the abundances of 8 species of plants were measured from 45 sites within 3 habitat types.
Essentially, the plant ecologist wanted to be able to compare the sites according to their plant communities.
In Workshop 14.2 we performed PCA on these data.
In the current workshop, we will instead start by assuming that the sampling spans multiple communities (the species of which are
likely to display unimodal abundance distributions) and there are strong environmental gradients operating across the landscape that are likely
to drive strong associations between species abundances and sites.
This approach will thus quantify the contributions of the relative frequencies to a $\chi^2$ statistic.
Call:
cca(X = veg[, c(-1, -2)])
Partitioning of mean squared contingency coefficient:
Inertia Proportion
Total 0.55 1
Unconstrained 0.55 1
Eigenvalues, and their contribution to the mean squared contingency coefficient
Importance of components:
CA1 CA2 CA3 CA4 CA5 CA6 CA7
Eigenvalue 0.260 0.156 0.0532 0.0456 0.0179 0.0109 0.00652
Proportion Explained 0.473 0.283 0.0968 0.0829 0.0326 0.0198 0.01185
Cumulative Proportion 0.473 0.756 0.8528 0.9358 0.9684 0.9881 1.00000
Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions
Examine the eigenvalues for each new component (group). They represent the contribution of each new component to the overall $\chi^2$.
The sum of these values should add up to the
$\chi^2$ value (also known as inertia).
If there were absolutely no associations between the species and sites,
then you would expect each new component to have a eigenvalue of $innertia/n$.
What do the eigenvalues indicate in this case?
Calculate the percentage of total $\chi^2$ explained by each of the new principal components.
How much of the total original variation is explained by principal component 1 (as a percentage)?
Calculate the cumulative sum of these percentages.
How much of the total $\chi^2$ is explained by the first three principal components (as a percentage)?
Using the eigenvalues and a screeplot, determine how many principal components are necessary to represent the original variables (species)
. How many principal components are necessary?
Show code
> screeplot(veg.ca)
> int <- veg.ca$tot.chi/length(veg.ca$CA$eig)
> abline(a = int, b = 0)
Generate a a quick biplot ordination (scatterplot of correspondence components) with correspondence component 1 on the x-axis and correspondence component 2 on the y-axis.
Are the patterns of sites associated with any particular species?
Show code
> plot(veg.ca, scaling = 2)
Whilst the above biplot illustrates some of the patterns, it does not allow us to directly see whether the communities change in the different habitats.
So lets instead construct the plot at a lower level.
Create the base ordination plot and add the sites (colored according to habitat). Since we are more interested in the habitats than
the actual sites, we can just label the points according to their habitat rather than their site names.
Show code
> veg.ord <- ordiplot(veg.ca, type = "n")
> text(veg.ord, "sites", lab = veg$HABITAT, col = as.numeric(veg$HABITAT))
Lets now add the species correlation vectors (component loadings). This will yield a biplot similar to the previous question.
Show code
> veg.ord <- ordiplot(veg.ca, type = "n")
> text(veg.ord, "sites", lab = veg$HABITAT, col = as.numeric(veg$HABITAT))
> data.envfit <- envfit(veg.ca, veg[, 3:8])
> plot(data.envfit, col = "grey")
Now lets fit the habitat vectors onto this ordination.
Before environmental variables can he added to an ordination plot, they must first be numeric representations.
If we wish to display the orientation of each habitat on the ordination plot, then we need to convert the habitat
variable into dummy variables.
To ensure you appreciate the patterns displayed in this ordination plot, answer the following questions.
Species 1 in primarily associated with principal component (axis)?
Species 2 in primarily associated with principal component (axis)?
Species 5 in primarily associated with principal component (axis)?
Habitat A aligns primarily with the
Habitat C strongly reflects the abundances of
It is also interesting to note that the sites predominantly line up along very narrow trajectories.
The environmental fit procedure above included a permutation test that
explored the relationship between each of the habitat types and the reduced ordination space communities (as defined by CA1 and CA2).
What conclusions would you draw from this analysis?
We will also return to the data of Peet & Loucks (1977) that examined the abundances of 8 species of trees (Bur oak, Black oak, White oak, Red oak, American elm, Basswood, Ironwood, Sugar maple) at 10 forest sites in southern Wisconsin, USA.
The data (given below) are the mean measurements of canopy cover for eight species of north American trees in 10 samples (quadrats).
Call:
cca(X = wisc[, -1])
Partitioning of mean squared contingency coefficient:
Inertia Proportion
Total 0.716 1
Unconstrained 0.716 1
Eigenvalues, and their contribution to the mean squared contingency coefficient
Importance of components:
CA1 CA2 CA3 CA4 CA5 CA6 CA7
Eigenvalue 0.532 0.0858 0.0553 0.0237 0.0125 0.00519 0.000869
Proportion Explained 0.744 0.1199 0.0773 0.0332 0.0174 0.00725 0.001210
Cumulative Proportion 0.744 0.8637 0.9409 0.9741 0.9915 0.99879 1.000000
Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions
Examine the eigenvalues for each new component (group).
What do the eigenvalues indicate in this case?
Calculate the percentage of total $\chi^2$ explained by each of the new principal components.
How much of the total original $\chi^2$ is explained by correspondence component 1 (as a percentage)?
Calculate the cumulative sum of these percentages.
How much of the total $\chi^2$ is explained by the first three correspondence components (as a percentage)?
Using the eigenvalues and a screeplot, determine how many correspondence components are necessary to represent the original variables (species)
. How many correspondence components are necessary?
Show code
> screeplot(wisc.ca)
> int <- veg.ca$tot.chi/length(veg.ca$CA$eig)
> abline(a = int, b = 0)
Generate a a quick biplot ordination (scatterplot of correspondence components) with correspondence component 1 on the x-axis and correspondence component 2 on the y-axis.
Are the patterns of quadrats associated with any particular tree species?