Workshop 11.4a - Logistic regression

23 April 2011

Basic χ² references

Logan (2010) - Chpt 16-17
Quinn & Keough (2002) - Chpt 13-14

Logistic regression

Polis et al. (1998) were intested in modelling the presence/absence of lizards (Uta sp.) against the perimeter to area ratio of 19 islands in the Gulf of California.

Download Polis data set

Format of polis.csv data file

ISLAND	RATIO	PA
Bota	15.41	1
Cabeza	5.63	1
Cerraja	25.92	1
Coronadito	15.17	0
..	..	..

ISLAND	Categorical listing of the name of the 19 islands used - variable not used in analysis.
RATIO	Ratio of perimeter to area of the island.
PA	Presence (1) or absence (0) of Uta lizards on island.

Open the polis data file (HINT).

Show code

> polis <- read.table("../downloads/data/polis.csv", header = T, sep = ",", 
+     strip.white = T)
> head(polis)

      ISLAND RATIO PA
1       Bota 15.41  1
2     Cabeza  5.63  1
3    Cerraja 25.92  1
4 Coronadito 15.17  0
5     Flecha 13.04  1
6   Gemelose 18.85  0

Presence absence data are clearly binary as an observation can only be in 1 of two states (1 or 0, present or absent). Modelling the linear component (systematic) against a normal stochastic component (that is expecting the residuals to follow a normal distribution) is clearly inappropriate. Instead, we could use a binary distribution with either a log link or probit link. For this example, we will use the log link and thus model the residuals on the log odds scale. In a later example, we will use the probit link.

What is the null hypothesis of interest?

Prior to fitting the model, we need to also be sure of the structure of the systematic linear component. For example, is the relationship linear on the log odds scale?

Show code

> plot(PA ~ RATIO, data = polis)
> with(polis, lines(lowess(PA ~ RATIO)))
> 
> #check by fitting cubic spline
> library(splines)
> polis.glmS <- glm(PA ~ ns(RATIO), data = polis, family = "binomial")
> summary(polis.glmS)

Call:
glm(formula = PA ~ ns(RATIO), family = "binomial", data = polis)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.607  -0.638   0.237   0.433   2.099  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)     3.56       1.68    2.12    0.034 *
ns(RATIO)     -17.24       7.89   -2.18    0.029 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 26.287  on 18  degrees of freedom
Residual deviance: 14.221  on 17  degrees of freedom
AIC: 18.22

Number of Fisher Scoring iterations: 6

> xs <- with(polis, seq(min(RATIO), max(RATIO), l = 100))
> ys <- predict(polis.glmS, newdata = data.frame(RATIO = xs), type = "response")
> lines(xs, ys, col = "red")

> 
> # The trend is more or less linear. If anything it is slightly sigmoidal
> # yet so long as it is linear within the bulk of the data, then linearity
> #   is fine.

Fitting the general linear model with a binomial error distribution (logit linkage)
(HINT).
Show code
> polis.glm <- glm(PA ~ RATIO, family = binomial, data = polis)
1. Check the model for lack of fit via:
  - Pearson Χ²
    Show code
    > polis.resid <- sum(resid(polis.glm, type = "pearson")^2) > 1 - pchisq(polis.resid, polis.glm$df.resid)
    [1] 0.5715
    > #No evidence of a lack of fit
  - Deviance (G²)
    Show code
    > 1 - pchisq(polis.glm$deviance, polis.glm$df.resid)
    [1] 0.6514
    > #No evidence of a lack of fit
2. Estimate the dispersion parameter to evaluate over dispersion in the fitted model
  - Via Pearson residuals
    Show code
    > polis.resid/polis.glm$df.resid
    [1] 0.9019
    > #No evidence of over dispersion
  - Via deviance
    Show code
    > polis.glm$deviance/polis.glm$df.resid
    [1] 0.8365
    > #No evidence of over dispersion
3. Compare the expected number of zeros to the actual number of zeros to determine whether the model might be zero inflated.
  - Calculate the proportion of zeros in the data
    Show code
    > polis.tab <- table(polis$PA == 0) > polis.tab/sum(polis.tab)
    FALSE TRUE 0.5263 0.4737
  - Assuming a probability of 0.5 for each trial, you would expect roughly 50:50 (50%) zeros and ones. So the data are not zero inflated.

As the diagnostics do not indicate any issues with the data or the fitted model, identify and interpret the following (HINT);

Show code

> polis.glm <- glm(PA ~ RATIO, family = binomial, data = polis)
> summary(polis.glm)

Call:
glm(formula = PA ~ RATIO, family = binomial, data = polis)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.607  -0.638   0.237   0.433   2.099  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)    3.606      1.695    2.13    0.033 *
RATIO         -0.220      0.101   -2.18    0.029 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 26.287  on 18  degrees of freedom
Residual deviance: 14.221  on 17  degrees of freedom
AIC: 18.22

Number of Fisher Scoring iterations: 6

sample constant (β₀)
sample slope (β₁)
Wald statistic (z value) for main H₀
P-value for main H₀

r² value (HINT)

Show code

> 1 - (polis.glm$dev/polis.glm$null)

[1] 0.459

Another way ot test the fit of the model, and thus test the H₀ that β₁ = 0, is to compare the fit of the full model to the reduce model via ANOVA. Perform this ANOVA (HINT) and identify the following

Show code

> anova(polis.glm, test = "Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: PA

Terms added sequentially (first to last)


      Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
NULL                     18       26.3             
RATIO  1     12.1        17       14.2  0.00051 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

G² statistic
df
P value

Construct a scatterplot of the presence/absence of Uta lizards against perimeter to area ratio for the 19 islands (HINT). Add to this plot, the predicted probability of occurances from the logistic regression (as well as 66% confidence interval - standard error). (HINT)

Show code

> par(mar = c(4, 5, 0, 0))
> plot(PA ~ RATIO, data = polis, type = "n", ann = F, axes = F)
> points(PA ~ RATIO, data = polis, pch = 16)
> xs <- seq(0, 70, l = 1000)
> ys <- predict(polis.glm, newdata = data.frame(RATIO = xs), type = "response", 
+     se = T)
> points(ys$fit ~ xs, col = "black", type = "l")
> lines(ys$fit - 1 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
> lines(ys$fit + 1 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
> axis(1)
> mtext("Island perimeter to area ratio", 1, cex = 1.5, line = 3)
> axis(2, las = 2)
> mtext(expression(paste(italic(Uta), " lizard presence/absence")), 
+     2, cex = 1.5, line = 3)
> box(bty = "l")

Calculate the LD50 (in this case, the perimeter to area ratio with a predicted probability of 0.5) from the fitted model (HINT).
Show code
> -polis.glm$coef[1]/polis.glm$coef[2]
(Intercept) 16.42
Islands above this ratio are not predicted to contain lizards and islands above this ratio are expected to have lizards.

Examine the odds ratio for the occurrence of Uta lizards.

Show code

> library(biology)
> odds.ratio(polis.glm)

      Odds ratio Lower 95% CI Upper 95% CI
RATIO     0.8029       0.6593       0.9777

> #the chances of Uta lizards being present on an island declines by
> #   approximately 20% (a change factor of 0.803) for every unit increase
> #   in perimeter to area ratio.

Logistic regression - grouped binary data

Bliss (1935) examined the toxic effects of gaseous carbon disulphide on the survival of flour beetles. He presented a number of datasets and associated analyses as a means to demonstrate models for calculating dosage curves.

In the first dataset, rather than indicating whether any individual beetle was dead or alive, Bliss (1935) recorded the number of dead and alive beetles grouped by gaseous carbon disulphide concentration.

Download the Bliss1 data set

Format of bliss1.csv data file

DEAD	ALIVE	CONC
2	28	0
8	22	1
15	15	2
23	7	3
27	3	4

DEAD	The number of dead flour beetles after 5 hours exposure to carbon disulphide gas.
ALIVE	The number of dead flour beetles after 5 hours exposure to carbon disulphide gas.
CONC	The concentration of gaseus carbon disulphide (in log₂ units).

Open the bliss1 data file (HINT).

Show code

> bliss <- read.table("../downloads/data/bliss1.csv", header = T, sep = ",", 
+     strip.white = T)
> head(bliss)

  dead alive conc
1    2    28    0
2    8    22    1
3   15    15    2
4   23     7    3
5   27     3    4

This is another example of clearly binary data (albeit in a different form). There are three obvious link function worth trying

log-link (the default)

Show code

> bliss.glm <- glm(cbind(dead, alive) ~ conc, data = bliss, family = "binomial")
> summary(bliss.glm)

Call:
glm(formula = cbind(dead, alive) ~ conc, family = "binomial", 
    data = bliss)

Deviance Residuals: 
      1        2        3        4        5  
-0.4510   0.3597   0.0000   0.0643  -0.2045  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -2.324      0.418   -5.56  2.7e-08 ***
conc           1.162      0.181    6.40  1.5e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 64.76327  on 4  degrees of freedom
Residual deviance:  0.37875  on 3  degrees of freedom
AIC: 20.85

Number of Fisher Scoring iterations: 4

> par(mfrow = c(2, 2))
> plot(bliss.glm)

probit

Show code

> bliss.glmP <- glm(cbind(dead, alive) ~ conc, data = bliss, family = binomial(link = "probit"))
> summary(bliss.glmP)

Call:
glm(formula = cbind(dead, alive) ~ conc, family = binomial(link = "probit"), 
    data = bliss)

Deviance Residuals: 
      1        2        3        4        5  
-0.3586   0.2749   0.0189   0.1823  -0.2754  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.3771     0.2278   -6.05  1.5e-09 ***
conc          0.6864     0.0968    7.09  1.3e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 64.76327  on 4  degrees of freedom
Residual deviance:  0.31367  on 3  degrees of freedom
AIC: 20.79

Number of Fisher Scoring iterations: 4

> par(mfrow = c(2, 2))
> plot(bliss.glmP)

complimentary log-log

Show code

> bliss.glmC <- glm(cbind(dead, alive) ~ conc, data = bliss, family = binomial(link = "cloglog"))
> summary(bliss.glmC)

Call:
glm(formula = cbind(dead, alive) ~ conc, family = binomial(link = "cloglog"), 
    data = bliss)

Deviance Residuals: 
     1       2       3       4       5  
-1.083   0.213   0.498   0.559  -0.672  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -1.994      0.313   -6.38  1.8e-10 ***
conc           0.747      0.109    6.82  8.9e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 64.7633  on 4  degrees of freedom
Residual deviance:  2.2305  on 3  degrees of freedom
AIC: 22.71

Number of Fisher Scoring iterations: 5

> par(mfrow = c(2, 2))
> plot(bliss.glmC)

On the basis of AIC as well as the residuals (which for the complimentary log-log model imply non-linearity), a log link would seem appropriate.

To confirm linearity on the log odds scale we can plot the empirical logit against concentration

Show code

> #examine linearity on the log odds scale
> # we add 1/2 (a small value) incase they are all dead or all alive.
> #This is called an empirical logit
> plot(bliss$conc, log((bliss$dead + 1/2)/(bliss$alive + 1/2)), xlab = "concentration", 
+     ylab = "logit")
> abline(coef(bliss.glm), col = 2)

Alternatively, we could superimpose a logistic curve onto the scatterplot of proportion dead against concentration

Show code

> plot(bliss$conc, bliss$dead/(bliss$dead + bliss$alive), xlab = "Concentration", 
+     ylab = "probability")
> #create an inverse logit function
> inv.logit <- function(x) exp(coef(bliss.glm)[1] + coef(bliss.glm)[2] * 
+     x)/(1 + exp(coef(bliss.glm)[1] + coef(bliss.glm)[2] * x))
> #plot this function as a curve
> curve(inv.logit, add = T, col = 2)

> #seems to fit well

Given that we have elected to use the log-link model, calculate the odds-ratio
Show code
> exp(bliss.glm$coef[2])
conc 3.196

Lets also include a summary figure

Show code

> plot(dead/(dead + alive) ~ conc, data = bliss, type = "n", ann = F, 
+     axes = F)
> points(dead/(dead + alive) ~ conc, data = bliss, pch = 16)
> xs <- seq(0, 4, l = 1000)
> ys <- predict(bliss.glm, newdata = data.frame(conc = xs), type = "response", 
+     se = T)
> points(ys$fit ~ xs, col = "black", type = "l")
> lines(ys$fit - 1 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
> lines(ys$fit + 1 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
> axis(1)
> mtext("Carbon disulphide concentration", 1, cex = 1.5, line = 3)
> axis(2, las = 2)
> mtext("Probability of death", 2, cex = 1.5, line = 3)
> box(bty = "l")

Logistic (cloglog) regression - grouped binary data

As a further demonstration of dosage curves, Bliss (1935) presented yet another datset on the toxic effects of gaseus carbon sulphide on flour beetle mortality. This time, Bliss (1935) documented the number of individuals exposed and dead following five hours at one of 8 dosage concentrations.

Download the Bliss2 data set

Format of bliss2.csv data file

Dose	Exposed	Mortality
1.6907	59	6
1.7242	60	13
1.7552	62	18
1.7842	56	28
1.8113	63	52
1.8369	59	53
1.8610	62	61
1.8839	60	60

Dose	The concentration of gaseus carbon disulphide.
Exposed	The number of flour beetles exposure to carbon disulphide gas.
Mortality	The number of dead flour beetles after 5 hours exposure to gaseus carbon disulphide.

Open the bliss1 data file (HINT).

Show code

> bliss2 <- read.table("../downloads/data/bliss2.csv", header = T, 
+     sep = ",", strip.white = T)
> head(bliss2)

   Dose Exposed Mortality
1 1.691      59         6
2 1.724      60        13
3 1.755      62        18
4 1.784      56        28
5 1.811      63        52
6 1.837      59        53

Start by exploring a range of possible binary link functions
1. logit - logistic (log odds-ratio) model
  \begin{align*} log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1Dose \end{align*}
  Show code
  > bliss2.glmL <- glm((Mortality/Exposed) ~ Dose, data = bliss2, family = "binomial", + weights = Exposed)
2. probit - probability unit model - $\phi$ is the inverse cumulative distribution function (cdf) for the standard normal distribution
  \begin{align*} \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\alpha+\beta.X} exp\left(-\frac{1}{2}Z^2\right)dZ = \beta_0 + \beta_1Dose\\ \phi = \beta_0 + \beta_1Dose \end{align*}
  Show code
  > bliss2.glmP <- glm((Mortality/Exposed) ~ Dose, data = bliss2, family = binomial(link = "probit"), + weights = Exposed)
3. cloglog - complimentary log-log model
  \begin{align*} log(-log(1-p))= \beta_0 + \beta_1Dose \end{align*}
  Show code
  > bliss2.glmC <- glm((Mortality/Exposed) ~ Dose, data = bliss2, family = binomial(link = "cloglog"), + weights = Exposed)

Evaluate which of the above models fits the data best on the basis of

Plotting the predicted trends

Show code

> #create the base scatterplot
> plot((Mortality/Exposed) ~ Dose, data = bliss2)
> #first plot a lowess smoother in black
> with(bliss2, lines(lowess((Mortality/Exposed) ~ Dose)))
> #create a sequence of new Dose values from which to predict mortality
> xs <- with(bliss2, seq(min(Dose), max(Dose), l = 100))
> 
> # predict and plot mortality from the logit model
> ys <- predict(bliss2.glmL, newdata = data.frame(Dose = xs), type = "response")
> lines(xs, ys, col = "blue")
> 
> # predict and plot mortality from the probit model
> ys <- predict(bliss2.glmP, newdata = data.frame(Dose = xs), type = "response")
> lines(xs, ys, col = "red")
> 
> # predict and plot mortality from the complimentary log-log model
> ys <- predict(bliss2.glmC, newdata = data.frame(Dose = xs), type = "response")
> lines(xs, ys, col = "green")

> 
> # Whilst either the log or probit models are ok (and largely
> #   indistinguishable with respect to fit), the cloglog is clearly a
> #   better fit

AIC

Show code

> #logit model
> AIC(bliss2.glmL)

[1] 15.23

> library(AICcmodavg)
> AICc(bliss2.glmL)

[1] 43.83

> #probit model
> AIC(bliss2.glmP)

[1] 14.12

> AICc(bliss2.glmP)

[1] 42.72

> #complimentary log-log model
> AIC(bliss2.glmC)

[1] 7.446

> AICc(bliss2.glmC)

[1] 36.04

> # Irrespective of whether we adopt AIC or AICc (corrected for small sample
> #   sizes), the cloglog is considered a better fit.

Which is the better link function on this occasion

Explore the parameter estimates for the cloglog model

Show code

> summary(bliss2.glmC)

Call:
glm(formula = (Mortality/Exposed) ~ Dose, family = binomial(link = "cloglog"), 
    data = bliss2, weights = Exposed)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8033  -0.5513   0.0309   0.3832   1.2888  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -39.57       3.24   -12.2   <2e-16 ***
Dose           22.04       1.80    12.2   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 284.2024  on 7  degrees of freedom
Residual deviance:   3.4464  on 6  degrees of freedom
AIC: 33.64

Number of Fisher Scoring iterations: 4

Logistic regression - polynomial

As part of a Masters thesis on the distribution and habitat suitability of the endangered Corroboree frog, Hunter (2000) recorded the presence and absence of frogs in relation to a range of climatic factors (including mean minimum spring temperature).

Show code

> library(DAAG)
> head(frogs)

  pres.abs northing easting altitude distance NoOfPools NoOfSites avrain meanmin meanmax
2        1      115    1047     1500      500       232         3  155.0   3.567   14.00
3        1      110    1042     1520      250        66         5  157.7   3.467   13.80
4        1      112    1040     1540      250        32         5  159.7   3.400   13.60
5        1      109    1033     1590      250         9         5  165.0   3.200   13.17
6        1      109    1032     1590      250        67         5  165.0   3.200   13.17
7        1      106    1018     1600      500        12         4  167.3   3.133   13.07

Show code

> plot(pres.abs ~ meanmin, data = frogs, ylab = "Presence-absence", 
+     xlab = "Mean minimum Spring temperature")
> lines(lowess(frogs$pres.abs ~ frogs$meanmin), col = 2)

> #suggests possible quadratic - optimum temperature?
> 
> #we can also look at the structure of the model by plotting on a logit
> #   scale
> #after grouping the data first
> #so we need to create categories on the x-axis
> #Obviously, at the tails of a distribution, the counts diminish
> #   enormously.
> #it is recomended that categories at the tails are pooled
> meanmin.cuts <- cut(frogs$meanmin, quantile(frogs$meanmin, seq(0, 
+     1, 0.1)), include.lowest = TRUE)
> yi <- tapply(frogs$pres.abs, meanmin.cuts, sum)
> #calculate the empirical logit
> emp.logit <- log((yi + 1/2)/(table(meanmin.cuts) - yi + 1/2))
> #Get the interval midpoints
> #midpt of interval (a,b) is a+(b-a)/2
> midpt <- quantile(frogs$meanmin, seq(0, 1, 0.1))[1:10] + (quantile(frogs$meanmin, 
+     seq(0, 1, 0.1))[2:11] - quantile(frogs$meanmin, seq(0, 1, 0.1))[1:10])/2
> 
> plot(midpt, emp.logit, xlab = "Mean minimum Spring temperature", 
+     ylab = "empirical logit")
> lines(lowess(emp.logit ~ midpt), col = 2)
> rug(midpt)

> 
> #fit the polynomial model
> frog.glm <- glm(pres.abs ~ meanmin + I(meanmin^2), data = frogs, 
+     family = binomial)
> summary(frog.glm)

Call:
glm(formula = pres.abs ~ meanmin + I(meanmin^2), family = binomial, 
    data = frogs)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.366  -1.003  -0.463   1.058   2.341  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -21.175      5.281   -4.01  6.1e-05 ***
meanmin        11.665      3.190    3.66  0.00026 ***
I(meanmin^2)   -1.574      0.472   -3.34  0.00085 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 279.99  on 211  degrees of freedom
Residual deviance: 241.76  on 209  degrees of freedom
AIC: 247.8

Number of Fisher Scoring iterations: 5

> plot(frogs$meanmin, frogs$pres.abs, xlab = "Mean minimum Spring temperature", 
+     ylab = "presence/absence")
> #add linear model
> #phat<-function(x) exp(coef(out1)[1] + coef(out1)[2]*x)/
> #   (1+exp(coef(out1)[1] + coef(out1)[2]*x))
> #curve(phat, add=T, col=2)
> #add quadratic
> phat2 <- function(x) exp(coef(frog.glm)[1] + coef(frog.glm)[2] * 
+     x + coef(frog.glm)[3] * x^2)/(1 + exp(coef(frog.glm)[1] + coef(frog.glm)[2] * 
+     x + coef(frog.glm)[3] * x^2))
> curve(phat2, add = T, col = 4)
> #alternatively
> points(frogs$meanmin, predict(frog.glm, type = "response"), col = "red")
> #add lowess
> lines(lowess(frogs$pres.abs ~ frogs$meanmin), col = 1, lty = 2)
> legend(2.2, 0.9, c("linear", "quadratic", "lowess"), col = c(2, 4, 
+     1), lty = c(1, 1, 2), bty = "n", cex = 0.9)

Probit regression

Same data as above

Show code

> polis.glmP <- glm(PA ~ RATIO, data = polis, family = binomial(link = probit))
> summary(polis.glmP)

Call:
glm(formula = PA ~ RATIO, family = binomial(link = probit), data = polis)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.611  -0.677   0.188   0.415   2.055  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)   2.1345     0.8913    2.39    0.017 *
RATIO        -0.1275     0.0522   -2.44    0.015 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 26.287  on 18  degrees of freedom
Residual deviance: 14.164  on 17  degrees of freedom
AIC: 18.16

Number of Fisher Scoring iterations: 7

> 
> 
> polis.glm <- glm(PA ~ RATIO, family = binomial, data = polis)
> 
> 
> par(mar = c(4, 5, 0, 0))
> plot(PA ~ RATIO, data = polis, type = "n", ann = F, axes = F)
> points(PA ~ RATIO, data = polis, pch = 16)
> xs <- seq(0, 70, l = 1000)
> ys <- predict(polis.glmP, newdata = data.frame(RATIO = xs), type = "response", 
+     se = T)
> points(ys$fit ~ xs, col = "black", type = "l")
> lines(ys$fit - 1 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
> lines(ys$fit + 1 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
> 
> ys <- predict(polis.glm, newdata = data.frame(RATIO = xs), type = "response", 
+     se = T)
> points(ys$fit ~ xs, col = "red", type = "l")
> lines(ys$fit - 1 * ys$se.fit ~ xs, col = "red", type = "l", lty = 2)
> lines(ys$fit + 1 * ys$se.fit ~ xs, col = "red", type = "l", lty = 2)
> 
> axis(1)
> mtext("Island perimeter to area ratio", 1, cex = 1.5, line = 3)
> axis(2, las = 2)
> mtext(expression(paste(italic(Uta), " lizard presence/absence")), 
+     2, cex = 1.5, line = 3)
> box(bty = "l")

> 
> #the probability of Uta lizards being present declines by 0.128 (12.7
> #   units of percentage) for every 1 unit increase in perimeter to area
> #   ratio

Workshop 11.4a - Logistic regression

Basic χ2 references

Logistic regression

Logistic regression - grouped binary data

Logistic (cloglog) regression - grouped binary data

Logistic regression - polynomial

Probit regression

Basic χ² references