Tutorial 6.2a - t-tests

21 May 2017

Two independent sample t-tests

As a starting point for this two sample t-tests, we will generate some fabricated representing a single response collected from two populations (Population A and Population B). The characteristics of the populations (which are obviously not normally known - rather these are the parameters that we seek to estimate and make inferences about) as well as some aspects of the experimental design used to collect the observations are:

the mean of Population 1 is 105
the mean of Population 2 is 77.5
both populations have a standard deviation of 3.0
Population 1 and 2 were sampled with $n=60$ and $n=40$ respectively

Details of data generation

set.seed(1)
nA <- 60  #sample size from Population A
nB <- 40  #sample size from Population B
muA <- 105  #population mean of Population A
muB <- 77.5  #population mean of Population B
sigma <- 3.0  #standard deviation of both populations (equally varied)
yA <- rnorm(nA, muA, sigma)  #Population A sample
yB <- rnorm(nB, muB, sigma)  #Population B sample
y <- c(yA, yB)
x <- factor(rep(c("A", "B"), c(nA, nB)))  #categorical listing of the populations
data <- data.frame(y, x)  # dataset

Assumptions and exploratory data analysis

We are going to use the samples to estimate the population means as well as test the null hypothesis that the populations are equal (i.e. that the population means are the same - Population A minus Population b equals 0).

This simple null hypothesis can be tested using a t-test. However, for the test to be reliable, it assumes that:

the populations from which the samples were collected were normally distributed
the populations from which the samples were collected were equally varied
the collected samples represented the populations in an unbiased manner

The last of these assumptions can only be addressed at the design and data collection phase. These fabricated data were generated from equally varied normal distributions and thus should adhere to the other two assumptions. Nevertheless, boxplots are a useful diagnotic.

boxplot(y~x, data)

tapply(data$y, data$x,var)

       A        B 
6.581824 8.474237

There is no evidence of non-normality nor gross heteroscedasticity (non-homogeneity of variance). The samples variances are not wildly different (yet are not the same).

In R, pooled-variances (student) and separate variances (Welch test) t-test are performed using the t.test() function.

For pooled-variances t-test:

t.test(y~x, data, var.equal=TRUE)

	Two Sample t-test

data:  y by x
t = 49.727, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 26.39339 28.58754
sample estimates:
mean in group A mean in group B 
      105.32285        77.83238

For separate-variances t-test:

t.test(y~x, data, var.equal=FALSE)

	Welch Two Sample t-test

data:  y by x
t = 48.479, df = 76.318, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 26.36115 28.61978
sample estimates:
mean in group A mean in group B 
      105.32285        77.83238

Whether using a pooled or separate variances t-test, the conclusions are the same: reject the null hypothesis that the populations are the same. The response of Population A is significantly higher than that of Population B.

The 95% confidence intervals indicate 95% of such intervals (intervals spanning a sample difference of 26.36-28.59 units) would contain the true mean.

Paired samples t-test

Lets generate some more fabricated data of a single response collected from pairs of samples representing sites sampled before and after an impact. In this case, there is actually only a single population (the difference between before and after). The degree to which sites respond to the impact differs from site to site (the variability) The characteristics of the populations and samples are:

the mean before response value is 100
the mean effect on this response after the impact is +30
the variability of the effect is 5.0
the variability of the sites is 20
There were $n=25$ sites before and after the impact

Details of data generation

set.seed(1)
n.sites <- 25
n.repeats <- 2
n <- n.sites * n.repeats
before <- 100
site.sd <- 20
ba.effect <- 30
sigma <- 15

int.effects <- rnorm(n=n.sites, mean=before, sd=site.sd)
ba <- gl(2,1,n, lab=c("Before","After"))
ba.effects <- rep(ba.effect,n.sites)
sites <- gl(n = n.sites, k = n.repeats)

all.effects <- c(int.effects, ba.effects)
Xmat <- model.matrix(~-1+sites*ba-ba)
lin.pred <- Xmat[,] %*% all.effects
eps <- rnorm(n=n,mean=0, sd=sigma)
y <- lin.pred + eps
data1 <- data.frame(y,sites,ba)
#check some of the properties
#before and after means
with(data1, tapply(y,ba,mean))

  Before    After 
104.2735 135.4377

#site standard deviation
sqrt(var(with(data1, tapply(y,sites,mean))))

[1] 21.32765

#traditional wide format
library(reshape)
data2 <- cast(data1,sites~ba, value="y")

Assumptions and exploratory data analysis

A paired t-test is essentially a one sample t-test of the differences between each pair (the null hypothesis is that the mean difference equals zero). Therefore, the regular assumptions pertain to the differences

Using the repeated measures (wide) data format, we create a new column that is the difference between the before and after columns - this mimics the response.

data2<-within(data2, Diff <- After-Before)

As with a regular t-test, normality can be explored using boxplots.

boxplot(data2$Diff)

There is no evidence of non-normality.

In R, paired t-test are also performed using the t.test() function.

t.test(data2$After, data2$Before, paired=TRUE)

	Paired t-test

data:  data2$After and data2$Before
t = 7.6622, df = 24, p-value = 6.714e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 22.76984 39.55870
sample estimates:
mean of the differences 
               31.16427

We would reject the null hypothesis that the difference between before and after is equal to zero. The impact results in a significant increase in the response. On average the impact lead to an increase in response of 31.16 units.

Worked examples

Basic statistics references

Logan (2010) - Chpt 6
Quinn & Keough (2002) - Chpt 6

Hypothesis testing

Furness & Bryant (1996) studied the energy budgets of breeding northern fulmars (Fulmarus glacialis) in Shetland. As part of their study, they recorded the body mass and metabolic rate of eight male and six female fulmars.

Download Furness data set

Format of furness.csv data files

SEX	METRATE	BODYMASS
MALE	2950	875
FEMALE	1956	765
MALE	2308	780
MALE	2135	790
MALE	1945	788

SEX	Sex of breeding northern fulmars (Fulmarus glacialis)
METRATE	Metabolic rate (hJ/day)
BODYMASS	Body mass (g)

Open the furness data file.

Show code

furness <- read.csv('../downloads/data/furness.csv',header=T, strip.white=TRUE)
head(furness)

     SEX METRATE BODYMASS
1   Male  2950.0      875
2 Female  1956.1      635
3   Male  2308.7      765
4   Male  2135.6      780
5   Male  1945.6      790
6 Female  1490.5      635

The researchers were interested in testing whether there is a difference in the metabolic rate of male and female breeding northern fulmars. In light of this, list the following:
1. The biological hypotheses of interest
2. The biological null hypotheses
3. The statistical null hypotheses (H₀)

The appropriate statistical test for testing the null hypothesis that the means of two independent populations are equal is a t-test

Before proceeding, make sure you understand what is meant by normality and equal variance as well as the principles of hypothesis testing using a t-test.

For the null hypothesis test of interest (that the mean population metabolic rate of males and females were the same), calculate the Degrees of freedom

Show code
#number of replicates in each group minus 2 (one for each population) sum(with(furness, tapply(METRATE, SEX, length)))-2
[1] 12
Calculate the critical t-values for the following null hypotheses (&alpha = 0.05)
1. The metabolic rate of males is higher than that females (one-tailed test) HINT
2. The metabolic rate of males is the same as that of females (two-tailed test) HINT
Show code
qt(0.05,df=12,lower.tail=F)
[1] 1.782288
qt(0.05/2,df=12,lower.tail=F)
[1] 2.178813

Since most hypothesis tests follow the same basic procedure, confirm that you understand the basic steps of hypothesis tests

In the table below, list the assumptions of a t-test along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

Assumption Diagnostic/Risk Minimization
I.

II.

III.

So, we wish to investigate whether or not male and female fulmars have the same metabolic rates, and that we intend to use a t-test to test the null hypothesis that the population mean metabolic rate of males is equal to the population mean metabolic rate of females. Having identified the important assumptions of a t-test, use the samples to evaluate whether the assumptions are likely to be violated and thus whether a t-test is likely to be reliability.
Is there any evidence that; HINT
1. The assumption of normality has been violated?
2. The assumption of homogeneity of variance has been violated?
Show code
boxplot(METRATE~SEX, data=furness)
Perform a t-test to examine the effect of sex on the mass of fulmars using either (which ever is most appropriate) a pooled variance t-test (for when population variances are very similar HINT) or separate variance t-test (for when the variance of one population is likely to be up to 2.5 times greater or less than the other population HINT). Ensure that you are familiar with the output of a t-test.
1. What is the t-value? (Excluding the sign. The sign will depend on whether you compared males to females or females to males, and thus only indicates which group had the higher mean).
2. What is the df (degrees of freedom).
3. What is the p value.
Show code
#Separate variance t-test is probably more reliable, as there is some evidence that the two populations are not exactly equal in variability t.test(METRATE~SEX, equal.var=F,data=furness)
Welch Two Sample t-test data: METRATE by SEX t = -0.77317, df = 10.468, p-value = 0.4565 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1075.3208 518.8042 sample estimates: mean in group Female mean in group Male 1285.517 1563.775
Write the results out as though you were writing a research paper/thesis. For example (select the phrase that applies and fill in gaps with your results):
The mean metabolic rate of male fulmars was (choose correct option)
(t = , df = , P = ) the mean metabolic rate of female fulmars.

Assumption	Diagnostic/Risk Minimization
I.
II.
III.

Construct a bar graph showing the mean metabolic rate of male and female fulmars and an indication of the precision of the means with error bars.

Show code

# calculate the means
means <- tapply(furness$METRATE, furness$SEX, mean)
# calculate the standard deviation
sds <- tapply(furness$METRATE, furness$SEX, sd)
# calculate the lengths
ns <- tapply(furness$METRATE, furness$SEX, length)
# calculate the standard errors
ses <- sds/sqrt(ns)
# plot the bars
xs <- barplot(means,beside=T)
# load package containing error bar function
library(Hmisc)
# plot the error bars
errbar(xs, means, means+ses, means-ses, add=T)

Show code

# set the size of the figure margins
opar<-par(mar=c(4,5,0,0))
# calculate the means
means <- tapply(furness$METRATE, furness$SEX, mean)
# calculate the standard deviation
sds <- tapply(furness$METRATE, furness$SEX, sd)
# calculate the lengths
ns <- tapply(furness$METRATE, furness$SEX, length)
# calculate the standard errors
ses <- sds/sqrt(ns)
#calculate the data limits
lim <- max(means+ses)
# plot the bars
xs <- barplot(means,beside=T, axes=FALSE, ann=FALSE, ylim=c(0,lim))
# plot the error bars
arrows(xs,means-ses,xs,means+ses, len=0.1, ang=90, code=3)
# generate the axes
mtext("Sex",1, line=2.5, cex=2)
axis(2,las=1)
mtext("Metabolic rate",2,line=3.5, cex=2)
box(bty="l")

Show ggplot code

library(ggplot2)
library(plyr)
library(gmodels)
dat <- ddply(furness, ~SEX, function(x) {
  data.frame(Metrate=mean(x$METRATE), t(ci(x$METRATE)))
})
head(dat)

     SEX  Metrate Estimate CI.lower CI.upper Std..Error
1 Female 1285.517 1285.517 843.7436 1727.290   171.8572
2   Male 1563.775 1563.775 816.0607 2311.489   316.2085

ggplot(dat, aes(y=Metrate, x=SEX)) +
  geom_point(data=furness, aes(y=METRATE, x=SEX), color='grey')+
  geom_pointrange(aes(ymin=CI.lower, ymax=CI.upper))+
  scale_y_continuous(expression(Metabolic~rate~(HJ/day)))+
  scale_x_discrete('')+
  theme_classic()+theme(axis.title.y=element_text(vjust=2, size=rel(1.25), face='bold'))

Paired data

Here is a modified example from Quinn and Keough (2002). Elgar et al. (1996) studied the effect of lighting on the web structure or an orb-spinning spider. They set up wooden frames with two different light regimes (controlled by black or white mosquito netting), light and dim. A total of 17 orb spiders were allowed to spin their webs in both a light frame and a dim frame, with six days `rest' between trials for each spider, and the vertical and horizontal diameter of each web was measured. Whether each spider was allocated to a light or dim frame first was randomized. The H₀'s were that each of the two variables (vertical diameter and horizontal diameter of the orb web) were the same in dim and light conditions. Elgar et al. (1996) correctly treated these as paired comparisons because the same spider spun her web in a light frame and a dark frame.

Download Elgar data set

Format of elgar.csv data files

PAIR	VERTDIM	HORIZDIM	VERTLIGH	HORIZLIGH
..	..	..	..	..
..	..	..	..	..
..	..	..	..	..

PAIR	Name given to each pair of webs spun by a particular spider
VERTDIM	The vertical dimension or height (mm) of webs spun in dim conditions
HORIZDIM	The horizontal dimension or width (mm) of webs spun in dim conditions
VERTLIGH	The vertical dimension or height (mm) of webs spun in light conditions
HORIZLIGH	The horizontal dimension or width (mm) of webs spun in light conditions

Note:for paired t-tests, it is traditional for categories to be column labels rather than entries in a categorical variable. Compare the structure of the elgar data (paired t-test) set with that of the furness (standard t-test) data set.

Open the elgar data file.

Show code

elgar <- read.csv("../downloads/data/elgar.csv", header=T, strip.white=T)
head(elgar)

  PAIR VERTDIM HORIZDIM VERTLIGHT HORIZLIGHT
1    K     300      295        80         60
2    M     240      260       120        140
3    N     250      280       170        160
4    O     220      250        90        120
5    P     160      160       150        180
6    R     170      150       110         90

What is an appropriate statistical test for testing an hypothesis about the difference in dimensions of webs spun in light versus dark conditions? Explain why?

The actual H₀ is that the mean of the differences between the pairs (light and dim for each spider) equals zero. Use a paired t-test to test the H₀ that the mean of the differences in vertical diameter (HINT) and separately, in horizontal diameter (HINT) of the web between the pairs (light and dim for each spider) equal zero.

Show code

t.test(elgar$VERTDIM,elgar$VERTLIGHT, paired=T)

	Paired t-test

data:  elgar$VERTDIM and elgar$VERTLIGHT
t = 0.96545, df = 16, p-value = 0.3487
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -24.61885  65.79532
sample estimates:
mean of the differences 
               20.58824

t.test(elgar$HORIZDIM,elgar$HORIZLIGHT, paired=T)

	Paired t-test

data:  elgar$HORIZDIM and elgar$HORIZLIGHT
t = 2.1482, df = 16, p-value = 0.04735
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  0.6085725 91.7443687
sample estimates:
mean of the differences 
               46.17647

Write the results out as though you were writing a research paper/thesis. For example (select the phrase that applies and fill in gaps with your results):
The mean vertical diameter of spider webs in dim conditions was (choose correct option)
(t = , df = , P = )
the vertical dimensions in light conditions.
The mean horizontal diameter of spider webs in dim conditions was (choose correct option)
(t = , df = , P = )
the horizontal dimensions in light conditions.

Non-parametric tests

We will now revisit the data set of Furness & Bryant (1996) that was used in Question 4 to investigate the effects of gender on the metabolic rates of breeding northern fulmars (Fulmarus glacialis). Furness & Bryant (1996) also recorded the body mass of the eight male and six female fulmars they captured.

Since the males and female fulmars were all independent of one another, a t-test would be appropriate to test the null hypothesis of no difference in mean body weight of male and female fulmars.

Are the assumptions underlying this test met? (Y or N) Hint: check the relative sizes of the two sample variances and the distribution of body weight for each sex.

Show code

boxplot(BODYMASS~SEX, data=furness)

tapply(furness$BODYMASS,furness$SEX,mean)

Female   Male 
643.00 839.75

tapply(furness$BODYMASS,furness$SEX,var)

  Female     Male 
 166.000 6214.786

When the distributional assumptions are violated, parametric tests are unreliable. Under these circumstances, non-parametric tests can be very useful.

The Wilcoxon-Mann-Whitney test is described as a non-parametric test for comparing two groups.
1. What null hypothesis does this test actually evaluate?
2. What are the underlying assumptions of a Wilcoxon-Mann-Whitney test?
If the assumptions are met, test the null hypothesis of no difference in body weight between male and female fulmars using a Wilcoxon test HINT. Based on this outcome, what are your conclusions?
1. Statistical:
2. Biological (include trend):
Show code
wilcox.test(BODYMASS~SEX, data=furness)
Wilcoxon rank sum test with continuity correction data: BODYMASS by SEX W = 0, p-value = 0.002309 alternative hypothesis: true location shift is not equal to 0

Construct a bar graph showing the mean mass of male and female fulmars and an indication of the precision of the means with error bars.

Show code

##use ggplot to calculate bootstrap confidence intervals and means
ggplot(furness, aes(y=METRATE, x=SEX)) +
  geom_point(data=furness, aes(y=METRATE, x=SEX), color='grey')+
  geom_pointrange(stat='summary', fun.data='mean_cl_boot')+
  scale_y_continuous(expression(Metabolic~rate~(HJ/day)))+
  scale_x_discrete('')+
  theme_classic()+theme(axis.title.y=element_text(vjust=2, size=rel(1.25), face='bold'))

# set the size of the figure margins opar<-par(mar=c(4,5,0,0)) # calculate the means means <- tapply(furness$METRATE, furness$SEX, mean) # calculate the standard deviation sds <- tapply(furness$METRATE, furness$SEX, sd) # calculate the lengths ns <- tapply(furness$METRATE, furness$SEX, length) # calculate the standard errors ses <- sds/sqrt(ns) #calculate the data limits lim <- max(means+ses) # plot the bars xs <- barplot(means,beside=T, axes=FALSE, ann=FALSE, ylim=c(0,lim)) # plot the error bars arrows(xs,means-ses,xs,means+ses, len=0.1, ang=90, code=3) # generate the axes mtext("Sex",1, line=2.5, cex=2) axis(2,las=1) mtext("Metabolic rate",2,line=3.5, cex=2) box(bty="l")

Sample number	Sample mean
1	12.1
2	12.7
3	12.5
Mean of sample means	12.433
> SD of sample means	0.306