Tutorial 12.10 - Integrated Nested Laplace Approximation (INLA) for (generalized) linear models

25 Feb 2016

Overview

Tutorial 12.9 introduced the basic INLA framework as well as a fully fleshed out example using a fabricated data set. The current tutorial will focus on linear models and using both fabricated data and real worked examples.

In this tutorial we have an opportunity to explore INLA for multilevel (hierarchical) models.Although performing very simple linear regression with INLA might be seen as a complete overkill (as it does not really utilize the main features for which INLA is designed), from a pedagogical perspective, starting with simpler models for which robust and well understood solutions exist allows us to gradually build up an understanding of how INLA works. Model complexity can be gradually introduced, thereby allowing us to explore techniques progressively by building on more solid foundations. All data sets will be fabricated from set parameters so that we always have a 'truth' from which to compare outcomes and as a point of comparison, each data set will be followed by Frequentist and Bayesian MCMC outcomes.

Simple regression

library(R2jags)
library(INLA)
library(ggplot2)

Lets say we had set up an experiment in which we applied a continuous treatment ($x$) ranging in magnitude from 0 to 16 to a total of 16 sampling units ($n=16$) and then measured a response ($y$) from each unit. As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size = 16
the continuous $x$ variable ranging from 0 to 16
when the value of $x$ is 0, $y$ is expected to be 40 ($\beta_0=40$)
a 1 unit increase in $x$ is associated with a 1.5 unit decline in $y$ ($\beta_1=-1.5$)
the data are drawn from normal distributions with a mean of 0 and standard deviation of 5 ($\sigma^2=25$)

set.seed(1)
n <- 16
a <- 40  #intercept
b <- -1.5  #slope
sigma2 <- 25  #residual variance (sd=5)
x <- 1:n  #values of the year covariate
eps <- rnorm(n, mean = 0, sd = sqrt(sigma2))  #residuals
y <- a + b * x + eps  #response variable
#OR
y <- (model.matrix(~x) %*% c(a,b))+eps
data <- data.frame(y=round(y,1), x)  #dataset
head(data)  #print out the first six rows of the data set

ggplot(data, aes(y=y, x=x)) + geom_point() + theme_classic()

$$ y_i = \alpha + \beta x_i + \varepsilon_i \hspace{4em}\varepsilon_i\sim{}\mathcal{N}(0, \sigma^2) $$

Frequentist outcome

summary(lm(y~x, data=data))

Call:
lm(formula = y ~ x, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-11.3694  -3.5053   0.6239   2.6522   7.3909 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  40.7450     2.6718  15.250 4.09e-10 ***
x            -1.5340     0.2763  -5.552 7.13e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.095 on 14 degrees of freedom
Multiple R-squared:  0.6876,	Adjusted R-squared:  0.6653 
F-statistic: 30.82 on 1 and 14 DF,  p-value: 7.134e-05

confint(lm(y~x, data=data))

                2.5 %     97.5 %
(Intercept) 35.014624 46.4753756
x           -2.126592 -0.9413493

Bayesian outcome

modelString="
model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mu[i],tau)
      mu[i] <- alpha+beta*x[i] 
   }
 
   #Priors
   alpha ~ dnorm(0,1.0E-6)
   beta ~ dnorm(0,1.0E-6)
   tau <- 1 / (sigma * sigma)
   sigma~dgamma(0.1,0.001)
   #tau <- exp(log.tau)
   #log.tau ~ dgamma(1,0.01)
 }
"
data.list <- with(data, list(y=y, x=x, n=nrow(data)))
data.jags <- jags(data=data.list,
           inits=NULL,
           parameters.to.save=c('alpha','beta', 'sigma'),
           model.file=textConnection(modelString),
           n.chains=3,
           n.iter=100000,
           n.burnin=20000,
                   n.thin=100
           )

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 75

Initializing model

print(data.jags)

Inference for Bugs model at "5", fit using jags,
 3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 100
 n.sims = 2400 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%   97.5%  Rhat n.eff
alpha     40.813   2.837 35.375 38.950 40.766 42.656  46.533 1.001  2400
beta      -1.540   0.291 -2.119 -1.729 -1.536 -1.344  -0.973 1.001  2400
sigma      5.383   1.089  3.739  4.600  5.195  6.001   7.939 1.000  2400
deviance  98.660   2.585 95.609 96.706 98.040 99.958 105.200 1.001  2300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.3 and DIC = 102.0
DIC is an estimate of expected predictive error (lower deviance is better).

data.inla <- inla(y~x, data=data, control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))

The inla() function returns an object of class inla of which there are two methods available (summary and plot). For an inla fitted model, the summary method returns the Fixed and Random effects as well as a number of information criterion and limited fit diagnostics.

summary(data.inla)

Call:
c("inla(formula = y ~ x, data = data, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.1013          0.0632          0.0360          0.2006 

Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) 40.7441 2.5688    35.6573  40.7440    45.8219 40.7442   0
x           -1.5339 0.2657    -2.0599  -1.5339    -1.0087 -1.5339   0

The model has no random effects

Model hyperparameters:
                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.0438 0.0155     0.0199   0.0418     0.0798 0.0378

Expected number of effective parameters(std dev): 2.00(0.00)
Number of equivalent replicates : 8.00 

Deviance Information Criterion (DIC) ...: 100.64
Effective number of parameters .........: 2.634

Watanabe-Akaike information criterion (WAIC) ...: 101.40
Effective number of parameters .................: 2.942

Marginal log-Likelihood:  -64.55 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

Lets focus on the different components of this output. The 'Fixed' effects represent a set of summaries from the posterior distribution. These results are very similar to those yielded from the Frequentist and Bayesian MCMC approaches.

               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) 40.7441 2.5688    35.6573  40.7440    45.8219 40.7442   0
x           -1.5339 0.2657    -2.0599  -1.5339    -1.0087 -1.5339   0

The value of the Kullback-Leibler divergence (KLD) describes the difference between the standard Gaussian and the Simplified Laplace Approximation (SLA) for each posterior. Small values (as in the current situation) indicate that the posterior distribution is well approximated by a Gaussian distribution and thus there is no need to perform the more computationally intense 'full' Laplace approximation.

                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.0438 0.0155     0.0199   0.0418     0.0798 0.0378

The model hyperparameters represent a set of summaries of the hyperparameters. In this case, we have the precision of the Gaussian distribution. Recall that estimates of variation are on a precision scale rather than variance or standard deviation scale ($\tau = 1/\sigma^2$).

Expected number of effective parameters(std dev): 2.00(0.00)
Number of equivalent replicates : 8.00

The Expected number of parameters is the number of independent parameters in the model. In this case, although there are 3 actual parameters ($\alpha$, $\beta$ and $\tau$), since the fixed effects are correlated, on average there are only 2 (and there is no variation on this: sd = 0 - it is always 2).

The number of equivalent replicates is the number of replicates (sample size) per effective parameter. In this case, there are 16 observations, so there are $16/2=8$ equivalent replicates. Low values (relative to the overall sample size) are indicative of a poor fit.

Out-of-sample or leave-one-out predictive performance

Arguably, the best way to assess the fit of a model is via an estimate of the models predictive accuracy, the gold standard of which is out-of-sample estimation via cross-validation (in which each observed response is compared to a prediction derived from a model that trained on without the focal response value). However, as this approach required repeated model fits, it is not overly practical for large complex models, MCMC simulations or very sparse data. Traditional alternatives to cross-validation include Akaike's Information Criterion (AIC) and Deviance Information Criterion (DIC) when associated with Bayesian analyses.

Deviance Information Criterion (DIC)...: 100.64
Effective number of parameters .........: 2.634

Watanabe-Akaike information criterion (WAIC) ...: 101.40
Effective number of parameters .................: 2.942

Deviance Information Criterion (DIC) is a hierarchical modelling generalization of AIC that is particularly useful when the posterior distribution of a model is based on MCMC simulation and only valid when the posterior distribution is assumed to be approximately multivariate Gaussian. The deviance information criterion is a measure of the "goodness of fit" of a model penalizing for "complexity", and similar to AIC can be used for comparing and ranking competing models with smaller DIC better. DIC is known to be unreliable when the posterior distribution is not well summarized by its mean.

The effective number of parameters (pD) is the posterior mean deviance minus the deviance measured at the posterior mean of the parameters. Both the DIC and pD are similar to those reported by JAGS.

Wantanabe-Akaike Information Criterion or Widely Applicable Information Criterion (WAIC). By contrast to AIC (and DIC) WAIC is a more fully Bayesian approach for estimating the out-of-sample expectation based on the log pointwise posterior predictive density.

Two additional out-of-sample estimates (measures of fit) supported by INLA include:

Conditional Predictive Ordinate (CPO) is the density of the observed value of $y_i$ within the out-of-sample ($y_{-i}$) posterior predictive distribution. A small CPO value associated with an observation suggests that this observation is unlikely (or surprising) in light of the model, priors and other data in the model. In addition, the sum of the CPO values (or alternatively, the negative of the mean natural logarithm of the CPO values) is a measure of fit.
data.inla$cpo$cpo
[1] 0.047749222 0.074109301 0.041001671 0.018540996 0.076175384 0.046941565 0.073009041 0.062423313 [9] 0.069163109 0.072927701 0.021584385 0.073660042 0.057706132 0.001797082 0.033613593 0.071977093
sum(data.inla$cpo$cpo)
[1] 0.8423796
-mean(log(data.inla$cpo$cpo))
[1] 3.173897
In this case there are no CPO values that are considerably smaller (order of magnitude smaller) than the others - so with respect to the model, none of the observed values would be considered 'surprising'. Various assumptions are made behind INLA's computations. If any of these assumptions fail of an observation, it is flagged within (in this case) cpo$failure as a non-zero. When this is the case, it is prudent to recalculate the CPO values after removing the observations associated with failure flags not equal to zero and re-fitting the model. This can be performed using the inla.cpo() function.
data.inla$cpo$failure
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In this case, there are no non-zero CPO values.
Probability Integral Transforms (PIT) provides a version of CPO that is calibrated to the level of the Gaussian field so that it is clearer whether or not any of the values are 'small' (all values must be between 0 and 1). A histogram of PIT values that does not look approximately uniform (flat) indicates a lack of model fit.
plot(data.inla, plot.fixed.effects=FALSE, plot.lincomb=FALSE, plot.random.effects=FALSE, plot.hyperparameters=FALSE, plot.predictor=FALSE, plot.q=FALSE, plot.cpo=TRUE, single=FALSE)
In this case the PIT values do not appear to deviate from a uniform distribution and thus do not indicate a lack of fit.

The influence of priors

When fitting the INLA model above, we did not specify priors. Consequently, the default priors were employed. To review what these priors are, we can use the inla.show.hyperspec() function - which returns a list containing information about all the hyper-priors used in a model.

inla.show.hyperspec(data.inla)

List of 3
 $ predictor:List of 1
    $ hyper:List of 1
       $ theta:List of 8
          $ name      : atomic [1:1] log precision
          $ short.name: atomic [1:1] prec
          $ initial   : atomic [1:1] 11
          $ fixed     : atomic [1:1] TRUE
          $ prior     : atomic [1:1] loggamma
          $ param     : atomic [1:2] 1e+00 1e-05
 $ family   :List of 1
    $ :List of 3
       $ label: chr "gaussian"
       $ hyper:List of 1
          $ theta:List of 8
             $ name      : atomic [1:1] log precision
             $ short.name: atomic [1:1] prec
             $ initial   : atomic [1:1] 4
             $ fixed     : atomic [1:1] FALSE
             $ prior     : atomic [1:1] loggamma
             $ param     : atomic [1:2] 1e+00 5e-05
       $ link :List of 1
          $ hyper: list()
 $ fixed    :List of 2
    $ :List of 3
       $ label     : chr "(Intercept)"
       $ prior.mean: num 0
       $ prior.prec: num 0
    $ :List of 3
       $ label     : chr "x"
       $ prior.mean: num 0
       $ prior.prec: num 0.001

The above list indicates that a logGamma prior with a shape ($a$) of 1 and inverse-scale ($b$) of 0.00005 for \theta (precision) of the Gaussian error distribution (for variance components). This parameterizes to: $$ \begin{align} mean &= \frac{a}{b}\\ variance &= \frac{a}{b^2}\\ \theta &= log(Gamma(a,b)) \end{align} $$

In order to drive the specification of weekly informative priors for variance components (a prior that allows the data to dictate the parameter estimates whilst constraining the range of the marginal distribution within sensible bounds), we can consider the range over which the marginal distribution of variance components would be sensible. For the range [-R,R], the equivalent gamma parameters: $$ \begin{align} a &= \frac{d}{2}\\ b &= \frac{R^2d}{2*(t^d_{1-(1-q)/2})^2} \end{align} $$ where $d$ is the degrees of freedom (for Cauchy marginal, $d=1$) and $t^d_{1-(1-q)/2}$ is the $q$th quantile of a Student $t$ with $d$ degrees of freedom. Hence if we considered variance likely to be in the range of [0,10], we could define a loggamma prior of: $$ \begin{align} a &= \frac{1}{2} = 0.5\\ b &= \frac{10^2}{2*161.447} = 0.31\\ \tau &\sim{} log\Gamma(0.5,0.31) \end{align} $$

The priors for the fixed effects (intercept and slope) are both Normal distributions with mean of 0 and precision (0.001). This implies that the prior for slope has a standard deviation of approximately 31 (since $\sigma = \frac{1}{\sqrt{\tau}}$). As a general rule, three standard deviations envelopes most of a distribution, and thus this prior defines a distribution whose density is almost entirely within the range [-93,93]. Whilst this might be appropriate for the intercept, it is arguably very wide for the slope given the range of the predictor variable.

In order to generate realistic informative Gaussian priors (for the purpose of constraining the posterior to a logical range) for fixed parameters, the following formulae are useful: $$ \begin{align} \mu &= \frac{z_2\theta_1 - z_1\theta_2}{z_2-z_1}\\ \sigma &= \frac{\theta_2 - \theta_1}{z_2-z_1} \end{align} $$ where $\theta_1$ and $\theta_2$ are the quantiles on the response scale and $z_1$ and $z_2$ are the corresponding quantiles on the standard normal scale. Hence, if we considered that the slope is likely to be in the range of [-10,10], we could specify a Normal prior with mean of ($\mu=\frac{(qnorm(0.5,0,1)*0) - (qnorm(0.975,0,1)*10)}{10-0} = 0$) and a standard deviation of ($\mu=\frac{10 - 0}{qnorm(0.975,0,1)-qnorm(0.5,0,1)} = 5.102$). In INLA (which defines priors in terms of precision rather than standard deviation), the associated prior would be $\beta \sim{} \mathcal{N}(0, 0.196)$.

Lets plot the prior and posterior for slope.

#the default priors specify a standard deviation of:
(SD<-(1/sqrt(0.001) * qnorm(0.975,0,1)))

[1] 61.9795

prior <- data.frame(x=seq(-3*SD,3*SD,len=1000))
prior$density <- dnorm(prior$x,0,SD)
post <- data.frame(data.inla$marginals.fixed[[2]])
head(post)

          x            y
1 -4.190463 1.765936e-16
2 -3.659143 9.807886e-11
3 -3.127823 1.422271e-06
4 -2.862163 6.240289e-05
5 -2.596503 1.483329e-03
6 -2.330843 2.134545e-02

ggplot(prior, aes(y=density, x=x)) + geom_line(aes(color='Prior')) +
       geom_line(data=post, aes(y=y, x=x, color='Posterior')) +
       scale_color_discrete('')+
       theme_classic()+theme(legend.position=c(1,1), legend.justification=c(1,1))

Arguably, the default prior for this fixed parameter is very wide...

s <- inla.contrib.sd(data.inla, nsamples = 1000)
s$hyper

                                     mean        sd     2.5%    97.5%
sd for the Gaussian observations 4.991154 0.9202552 3.527553 7.121831

data.inla <- inla(y~x, data=data, control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
                  control.fixed=list(mean=0, prec=0.196, mean.intercept=0, prec.intercept=0.001),
                  control.family=list(hyper=list(prec=list(prior='loggamma', param=c(0.1,0.1)))))
summary(data.inla)

Call:
c("inla(formula = y ~ x, data = data, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE), control.family = list(hyper = list(prec = list(prior = \"loggamma\", ",  "    param = c(0.1, 0.1)))), control.fixed = list(mean = 0, prec = 0.196, ",  "    mean.intercept = 0, prec.intercept = 0.001))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0911          0.0326          0.0285          0.1523 

Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) 40.2459 2.7140    34.8045  40.2675    45.5553 40.3057   0
x           -1.4833 0.2804    -2.0322  -1.4855    -0.9215 -1.4894   0

The model has no random effects

Model hyperparameters:
                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.0388 0.0146     0.0167   0.0368      0.073 0.0328

Expected number of effective parameters(std dev): 1.977(0.0069)
Number of equivalent replicates : 8.092 

Deviance Information Criterion (DIC) ...: 100.79
Effective number of parameters .........: 2.636

Watanabe-Akaike information criterion (WAIC) ...: 101.14
Effective number of parameters .................: 2.604

Marginal log-Likelihood:  -56.99 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

data.inla <- inla(y~x, data=data, control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
                  control.fixed=list(mean=0, prec=0.0196, mean.intercept=0, prec.intercept=0.001),
                  control.family=list(hyper=list(prec=list(prior='loggamma', param=c(0.1,0.1)))))
summary(data.inla)

Call:
c("inla(formula = y ~ x, data = data, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE), control.family = list(hyper = list(prec = list(prior = \"loggamma\", ",  "    param = c(0.1, 0.1)))), control.fixed = list(mean = 0, prec = 0.0196, ",  "    mean.intercept = 0, prec.intercept = 0.001))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0923          0.0239          0.0286          0.1448 

Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) 40.4222 2.7267    34.9757  40.4364    45.7778 40.4616   0
x           -1.5041 0.2822    -2.0593  -1.5054    -0.9414 -1.5077   0

The model has no random effects

Model hyperparameters:
                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.0388 0.0146     0.0167   0.0368      0.073 0.0328

Expected number of effective parameters(std dev): 1.991(0.0028)
Number of equivalent replicates : 8.036 

Deviance Information Criterion (DIC) ...: 100.80
Effective number of parameters .........: 2.651

Watanabe-Akaike information criterion (WAIC) ...: 101.16
Effective number of parameters .................: 2.62

Marginal log-Likelihood:  -57.94 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

s <- inla.contrib.sd(data.inla, nsamples = 1000)
s$hyper

                                     mean       sd     2.5%    97.5%
sd for the Gaussian observations 5.337874 1.088706 3.663546 7.863846

#the default priors specify a standard deviation of:
(SD<-(1/sqrt(0.196) * qnorm(0.975,0,1)))

[1] 4.427107

prior <- data.frame(x=seq(-3*SD,3*SD,len=1000))
prior$density <- dnorm(prior$x,0,SD)
post <- data.frame(data.inla$marginals.fixed[[2]])
head(post)

          x            y
1 -4.325697 2.134404e-16
2 -3.761375 1.037706e-10
3 -3.197053 1.349437e-06
4 -2.914892 5.713188e-05
5 -2.632731 1.331189e-03
6 -2.350570 1.925008e-02

ggplot(prior, aes(y=density, x=x)) + geom_line(aes(color='Prior')) +
       geom_line(data=post, aes(y=y, x=x, color='Posterior')) +
       scale_color_discrete('')+
       theme_classic()+theme(legend.position=c(1,1), legend.justification=c(1,1))

More detailed information

In addition to the PIT and CPO distributions, there are numerous other summary plots available, each of which summarizes the marginal distribution of the different types of model parameters (fixed, random, hyper-parameters etc) and predictions. These can be produced in one hit using the plot() function. However, I will produce them individually in order to assist discussion. Moreover, as a result of complexities arising from the need for the plot() function to work in Rstudio, INLA's plot() function and knitr are slightly incompatible at the moment...

Additionally, the fitted INLA object (a list) contains a large number of items, each of which capture different aspects of the approximated parameters.

names(data.inla)

 [1] "names.fixed"                 "summary.fixed"               "marginals.fixed"            
 [4] "summary.lincomb"             "marginals.lincomb"           "size.lincomb"               
 [7] "summary.lincomb.derived"     "marginals.lincomb.derived"   "size.lincomb.derived"       
[10] "mlik"                        "cpo"                         "po"                         
[13] "waic"                        "model.random"                "summary.random"             
[16] "marginals.random"            "size.random"                 "summary.linear.predictor"   
[19] "marginals.linear.predictor"  "summary.fitted.values"       "marginals.fitted.values"    
[22] "size.linear.predictor"       "summary.hyperpar"            "marginals.hyperpar"         
[25] "internal.summary.hyperpar"   "internal.marginals.hyperpar" "offset.linear.predictor"    
[28] "model.spde2.blc"             "summary.spde2.blc"           "marginals.spde2.blc"        
[31] "size.spde2.blc"              "model.spde3.blc"             "summary.spde3.blc"          
[34] "marginals.spde3.blc"         "size.spde3.blc"              "logfile"                    
[37] "misc"                        "dic"                         "mode"                       
[40] "neffp"                       "joint.hyper"                 "nhyper"                     
[43] "version"                     "Q"                           "graph"                      
[46] "ok"                          "cpu.used"                    "all.hyper"                  
[49] ".args"                       "call"                        "model.matrix"

This is obviously a very large list. In order to make more sense of all that is available, we can group the returned INLA list into functional categories:

parameter estimates

INLA object	Description
`names.fixed`	the names of fixed effects parameters
`summary.fixed`	summaries (mean, standard deviation and quantiles) of fixed posteriors (on link scale)
`marginals.fixed`	list of posterior marginal densities of the fixed effects
`size.random`	number of levels of each of the structured random covariates
`summary.random`	summaries (mean, standard deviation and quantiles) of the structured random effects (smoothers etc)
`marginals.random`	list of posterior marginal densities of the random effects
`summary.hyperpar`	summaries (mean, standard deviation and quantiles) of the model hyper-parameters (precision, overdispersion, correlation etc)
`marginals.hyperpar`	list of posterior marginal densities of the model hyper-parameters

predictions and contrasts

INLA object	Description
`model.matrix`	model matrix of the fixed effects - a row for each row of the input data
`summary.linear.predictor`	summaries (mean, standard deviation and quantiles) of the linear predictor - a prediction (on the link scale) associated with each row of the input data
`marginals.linear.predictor`	list of posterior marginal densities of the linear predictor
`summary.fitted.values`	summaries (mean, standard deviation and quantiles) of the fitted values - a prediction (on the response scale) associated with each row of the input data
`marginals.fitted.values`	list of posterior marginal densities of the fitted values
`summary.lincomb`	summaries (mean, standard deviation and quantiles) of the defined linear combinations (on the link scale)
`marginals.lincomb`	list of posterior marginal densities of the defined linear combinations
`summary.lincomb.derived`	summaries (mean, standard deviation and quantiles) of the defined linear combinations (on the response scale)
`marginals.lincomb.derived`	list of posterior marginal densities of the defined linear combinations (response scale)

model performance measures

INLA object	Description
`mlik`	log marginal likelihood of the model (when `mlik=TRUE`)
`dic`	deviance information criterion of the model (when `dic=TRUE`)
`waic`	Watanabe-Akaike (Widely Applicable) information criterion of the model (when `dic=TRUE`)
`neffp`	expected number of parameters (and standard deviation) and number of replicates per parameter in the model
`cpo`	conditional predictive ordinate (`cpo`), probability integral transform (`pit`) and vector of assumption failures (`failures`) of the model (when `cpo=TRUE`)

Fixed effects

plot(data.inla, plot.fixed.effects = TRUE, plot.lincomb = FALSE, plot.random.effects = FALSE, plot.hyperparameters = FALSE,
    plot.predictor = FALSE, plot.q = FALSE, plot.cpo = FALSE, single = FALSE)

We have already seen the summaries from the posterior (from the INLA summary), yet if we wanted to explore this individually..

data.inla$summary.fixed

                 mean        sd 0.025quant  0.5quant 0.975quant      mode          kld
(Intercept) 40.422174 2.7267362   34.97573 40.436435 45.7778033 40.461613 3.011495e-11
x           -1.504087 0.2821624   -2.05931 -1.505424 -0.9413974 -1.507741 2.972068e-11

We can also get the densities for selected values along the marginal distribution for each fixed effect.

data.inla$marginals.fixed

$`(Intercept)`
             x            y
 [1,] 13.15496 4.410151e-17
 [2,] 18.60840 1.905779e-11
 [3,] 24.06185 2.152067e-07
 [4,] 26.78857 8.435060e-06
 [5,] 29.51529 1.784278e-04
 [6,] 32.24201 2.276125e-03
 [7,] 33.83266 8.280310e-03
 [8,] 35.92316 3.499961e-02
 [9,] 36.96775 6.200022e-02
[10,] 37.65244 8.427154e-02
[11,] 38.19117 1.028661e-01
[12,] 38.64056 1.178672e-01
[13,] 39.04390 1.299800e-01
[14,] 39.41565 1.392874e-01
[15,] 39.76649 1.459042e-01
[16,] 40.10455 1.499193e-01
[17,] 40.27099 1.509687e-01
[18,] 40.43643 1.513822e-01
[19,] 40.60209 1.511608e-01
[20,] 40.76790 1.503044e-01
[21,] 41.10458 1.466640e-01
[22,] 41.45260 1.404086e-01
[23,] 41.81934 1.314582e-01
[24,] 42.21872 1.195765e-01
[25,] 42.66654 1.045571e-01
[26,] 43.18698 8.627934e-02
[27,] 43.85476 6.385703e-02
[28,] 44.86945 3.628873e-02
[29,] 46.88034 8.629793e-03
[30,] 48.60235 1.975121e-03
[31,] 51.32907 1.352744e-04
[32,] 54.05579 5.756324e-06
[33,] 56.78251 1.349249e-07
[34,] 62.23596 1.021137e-11
[35,] 67.68940 2.068547e-17

$x
                x            y
 [1,] -4.32569687 2.134404e-16
 [2,] -3.76137496 1.037706e-10
 [3,] -3.19705306 1.349437e-06
 [4,] -2.91489210 5.713188e-05
 [5,] -2.63273115 1.331189e-03
 [6,] -2.35057020 1.925008e-02
 [7,] -2.17334332 8.304301e-02
 [8,] -1.96491059 3.492400e-01
 [9,] -1.85981324 6.148759e-01
[10,] -1.79056620 8.315525e-01
[11,] -1.73580056 1.011021e+00
[12,] -1.69001537 1.154394e+00
[13,] -1.64879258 1.269090e+00
[14,] -1.61067721 1.356196e+00
[15,] -1.57459777 1.416978e+00
[16,] -1.53973520 1.452399e+00
[17,] -1.52252620 1.460799e+00
[18,] -1.50542406 1.463031e+00
[19,] -1.48827869 1.459127e+00
[20,] -1.47107359 1.449088e+00
[21,] -1.43609861 1.410468e+00
[22,] -1.39985194 1.346777e+00
[23,] -1.36155020 1.257371e+00
[24,] -1.31985206 1.140529e+00
[25,] -1.27276242 9.936500e-01
[26,] -1.21787798 8.165285e-01
[27,] -1.14702386 6.008029e-01
[28,] -1.03864956 3.384795e-01
[29,] -0.82220815 7.976568e-02
[30,] -0.65760448 2.185504e-02
[31,] -0.37544353 1.705125e-03
[32,] -0.09328257 8.038838e-05
[33,]  0.18887838 2.048111e-06
[34,]  0.75320028 1.812337e-10
[35,]  1.31752219 4.198486e-16

In case we wanted to plot these via ggplot...

library(dplyr)
data.inla.fixed <- reshape2:::melt(data.inla$marginals.fixed) %>%
                   reshape2:::dcast(L1+Var1~Var2, value='value')
ggplot(data.inla.fixed, aes(y=y, x=x)) + geom_line() +
      facet_wrap(~L1, scales='free', nrow=1) + theme_classic()

Predictor

plot(data.inla, plot.fixed.effects=FALSE, plot.lincomb=FALSE, plot.random.effects=FALSE,
     plot.hyperparameters=FALSE, plot.predictor=TRUE, plot.q=FALSE, plot.cpo=FALSE,
     single=FALSE)

data.inla$summary.linear.predictor

                 mean       sd 0.025quant 0.5quant 0.975quant     mode          kld
predictor.01 38.91874 2.483244   33.96448 38.93146   43.79792 38.95390 1.007436e-07
predictor.02 37.41459 2.247989   32.92995 37.42602   41.83181 37.44617 1.014385e-07
predictor.03 35.91045 2.024777   31.87159 35.92057   39.88955 35.93843 1.027624e-07
predictor.04 34.40632 1.818051   30.78060 34.41514   37.97993 34.43071 1.051053e-07
predictor.05 32.90219 1.634077   29.64458 32.90971   36.11533 32.92298 1.090693e-07
predictor.06 31.39807 1.481359   28.44673 31.40428   34.31258 31.41524 1.154613e-07
predictor.07 29.89397 1.370386   27.16634 29.89887   32.59239 29.90751 1.250763e-07
predictor.08 28.38989 1.311798   25.78241 28.39345   30.97613 28.39978 1.380414e-07
predictor.09 26.88582 1.312630   24.28112 26.88804   29.47847 26.89203 1.530650e-07
predictor.10 25.38177 1.372775   22.66255 25.38262   28.09812 25.38426 1.678379e-07
predictor.11 23.87775 1.485046   20.93976 23.87723   26.82085 23.87651 1.806989e-07
predictor.12 22.37373 1.638769   19.13512 22.37184   25.62556 22.36874 1.923988e-07
predictor.13 20.86972 1.823489   17.26929 20.86646   24.49158 20.86098 2.011791e-07
predictor.14 19.36572 2.030767   15.35897 19.36108   23.40225 19.35322 2.077685e-07
predictor.15 17.86174 2.254388   13.41636 17.85572   22.34529 17.84548 2.126772e-07
predictor.16 16.35775 2.489953   11.45012 16.35035   21.31196 16.33773 2.163935e-07

data.inla$summary.fitted.values

                        mean       sd 0.025quant 0.5quant 0.975quant     mode
fitted.predictor.01 38.91874 2.483244   33.96448 38.93146   43.79792 38.95390
fitted.predictor.02 37.41459 2.247989   32.92995 37.42602   41.83181 37.44617
fitted.predictor.03 35.91045 2.024777   31.87159 35.92057   39.88955 35.93843
fitted.predictor.04 34.40632 1.818051   30.78060 34.41514   37.97993 34.43071
fitted.predictor.05 32.90219 1.634077   29.64458 32.90971   36.11533 32.92298
fitted.predictor.06 31.39807 1.481359   28.44673 31.40428   34.31258 31.41524
fitted.predictor.07 29.89397 1.370386   27.16634 29.89887   32.59239 29.90751
fitted.predictor.08 28.38989 1.311798   25.78241 28.39345   30.97613 28.39978
fitted.predictor.09 26.88582 1.312630   24.28112 26.88804   29.47847 26.89203
fitted.predictor.10 25.38177 1.372775   22.66255 25.38262   28.09812 25.38426
fitted.predictor.11 23.87775 1.485046   20.93976 23.87723   26.82085 23.87651
fitted.predictor.12 22.37373 1.638769   19.13512 22.37184   25.62556 22.36874
fitted.predictor.13 20.86972 1.823489   17.26929 20.86646   24.49158 20.86098
fitted.predictor.14 19.36572 2.030767   15.35897 19.36108   23.40225 19.35322
fitted.predictor.15 17.86174 2.254388   13.41636 17.85572   22.34529 17.84548
fitted.predictor.16 16.35775 2.489953   11.45012 16.35035   21.31196 16.33773

Note, this is not an actual regression trend, it is simply the fitted values against the index (order of the data set). The reason that it is produced twice in the first set of figures, is that it is produced once on the link scale (top) and once on the response scale (bottom). The above matrices contain the predicted and fitted value summaries respectively. As the data were modelled against a Gaussian response distribution (with an identity link), these are the same, both are fitted values on the response scale.

Hyper-parameters

plot(data.inla, plot.fixed.effects=FALSE, plot.lincomb=FALSE, plot.random.effects=FALSE,
     plot.hyperparameters=TRUE, plot.predictor=FALSE, plot.q=FALSE, plot.cpo=FALSE,
     single=FALSE)

data.inla$summary.hyperpar

                                              mean         sd 0.025quant   0.5quant 0.975quant       mode
Precision for the Gaussian observations 0.03882442 0.01458731 0.01670641 0.03679713 0.07303003 0.03278234

Note again that these are on the precision scale. In order to derive estimates on the more useful standard deviation scale we need to drawing MC samples from the posterior using the inla.contrib.sd() function.

s <- inla.contrib.sd(data.inla, nsamples = 1000)
s$hyper

                                     mean        sd     2.5%    97.5%
sd for the Gaussian observations 5.393922 0.9865159 3.717143 7.535196

The mean value for the standard deviation is 5.394 which is similar to the JAGS estimate.

Predictions

There are two main ways to generate predictions in INLA:

append additional data cases with missing (NA) response to the model input data. Missing response data will not contribute to the likelihood, yet will be imputed (estimated/predicted given the model trends). In order to illustrate producing a summary plot, I will predict the response for a sequence of predictor values.

newdata <- data.frame(x=seq(min(data$x, na.rm=TRUE), max(data$x, na.rm=TRUE), len=100), y=NA)
data.pred <- rbind(data, newdata)
#fit the model
data.inla1 <- inla(y~x, data=data.pred, control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
control.predictor=list(compute=TRUE))
#examine the regular summary - not there are no changes to the first fit
summary(data.inla1)

Call:
c("inla(formula = y ~ x, data = data.pred, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE), control.predictor = list(compute = TRUE))" )

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0663          0.0489          0.0239          0.1391 

Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) 40.7441 2.5688    35.6573  40.7440    45.8219 40.7442   0
x           -1.5339 0.2657    -2.0599  -1.5339    -1.0087 -1.5339   0

The model has no random effects

Model hyperparameters:
                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.0438 0.0155     0.0199   0.0418     0.0798 0.0378

Expected number of effective parameters(std dev): 2.00(0.00)
Number of equivalent replicates : 8.00 

Deviance Information Criterion (DIC) ...: 100.64
Effective number of parameters .........: 2.634

Watanabe-Akaike information criterion (WAIC) ...: 101.40
Effective number of parameters .................: 2.942

Marginal log-Likelihood:  -64.55 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

#extract the predictor values associated with the appended data
newdata <- cbind(newdata, data.inla1$summary.linear.predictor[(nrow(data)+1):nrow(data.pred),])
head(newdata)

                     x  y     mean       sd 0.025quant 0.5quant 0.975quant     mode          kld
predictor.017 1.000000 NA 39.21085 2.339342   34.58500 39.21057   43.83805 39.21031 1.031930e-07
predictor.018 1.151515 NA 38.97844 2.305138   34.42022 38.97816   43.53798 38.97790 1.031943e-07
predictor.019 1.303030 NA 38.74603 2.271133   34.25505 38.74575   43.23831 38.74550 1.031957e-07
predictor.020 1.454545 NA 38.51362 2.237335   34.08947 38.51334   42.93905 38.51309 1.031972e-07
predictor.021 1.606061 NA 38.28120 2.203755   33.92346 38.28094   42.64021 38.28069 1.031987e-07
predictor.022 1.757576 NA 38.04879 2.170402   33.75700 38.04853   42.34183 38.04828 1.032003e-07

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))
ggplot(newdata, aes(y=mean, x=x)) + geom_point(data=data, aes(y=y)) +
       geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
       geom_line() + theme_classic()

define linear combinations (contrasts) for the predictions. Note, the inla.make.lincombs() function seems a little fragile. It is vital that it be parsed a data.frame. Furthermore, the model matrix needs to be converted to a data frame using the as.data.frame() function rather than the data.frame() function as the latter alters the column names and causes problems within INLA.

newdata <- data.frame(x=seq(min(data$x, na.rm=TRUE), max(data$x, na.rm=TRUE), len=100))
Xmat <- model.matrix(~x, data=newdata)
lincomb <- inla.make.lincombs(as.data.frame(Xmat))
#fit the model
data.inla1 <- inla(y~x, lincomb=lincomb,  #include the linear combinations
   data=data,
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
   control.inla=list(lincomb.derived.only=FALSE))
#examine the regular summary - not there are no changes to the first fit
#summary(data.inla1)

head(data.inla1$summary.lincomb)

      ID     mean       sd 0.025quant 0.5quant 0.975quant     mode          kld
lc001  1 39.20315 2.359954   34.52673 39.20440   43.86357 39.20635 1.371747e-11
lc002  2 38.97090 2.325452   34.36286 38.97213   43.56320 38.97404 1.372809e-11
lc003  3 38.73864 2.291151   34.19858 38.73985   43.26322 38.74173 1.370907e-11
lc004  4 38.50639 2.257060   34.03389 38.50758   42.96365 38.50942 1.371347e-11
lc005  5 38.27414 2.223188   33.86877 38.27530   42.66452 38.27711 1.373200e-11
lc006  6 38.04188 2.189545   33.70319 38.04303   42.36584 38.04480 1.369481e-11

newdata <- cbind(newdata, data.inla1$summary.lincomb)

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))
ggplot(newdata, aes(y=mean, x=x)) + geom_point(data=data, aes(y=y)) +
       geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
       geom_line() + theme_classic()

In the current case (of generating predictions for the purpose of plotting a trend), the two approaches above yield identical outcomes. However, the second approach (of defining a linear predictor) allows for any linear contrasts (e.g. it can be used to compare different treatments etc).

One very good way to evaluate the fit of a model is on how well it estimates known parameters. This example generated simulated data given a set of parameters. We could therefore compare the model parameter estimates to these truths.

orig <- data.frame(parameter=factor(c('(Intercept)','x','Sigma2')), value=c(40,-1.5,5))
mod <- subset(data.inla$summary.fixed, select=c('mean','sd','0.025quant','0.975quant'))
mod <- reshape::rename(mod,c("0.025quant"="lower", "0.975quant"="upper"))
colnames(s$hyper) <- c('mean','sd','lower','upper')
mod <- rbind(mod, s$hyper)
mod$parameter <- factor(c('(Intercept)','x','Sigma2'))

ggplot(mod, aes(x=parameter, y=mean)) +
 geom_hline(yintercept=0, linetype='dashed')+
 geom_blank()+
 geom_linerange(aes(x=as.numeric(parameter)-0.05,ymin=lower, ymax=upper)) +
 geom_point(data=orig, aes(y=value, x=as.numeric(factor(parameter))+0.05, fill='Original'), shape=21)+
 geom_point(aes(x=as.numeric(parameter)-0.05, fill='Model'), shape=21)+
 coord_flip()+
 scale_fill_discrete('')+
 scale_y_continuous('')+
 theme_classic()+theme(legend.position=c(1,1), legend.justification=c(1,1))

Hence, it appears that the estimated (modelled) parameters are entirely consistent with the original parameters from which the data were simulated.

Multiple linear regression

In the previous example, I displayed many of the basic components of the fitted INLA model. In the remaining examples, we will build upon some of these and ignore others completely - depending on what is required...

Lets say we had set up a natural experiment in which we measured a response ($y$) from each of 20 sampling units ($n=20$) across a landscape. At the same time, we also measured two other continuous covariates ($x1$ and $x2$) from each of the sampling units. As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

set.seed(3)
n     = 100
intercept = 5
temp   = runif(n)
nitro = runif(n) + 0.8*temp
int.eff=2
temp.eff <- 0.85
nitro.eff <- 0.5
res = rnorm(n, 0, 1)
coef <- c(int.eff, temp.eff, nitro.eff,int.eff)
mm <- model.matrix(~temp*nitro)

y <- t(coef %*% t(mm)) + res
data <- data.frame(y,x1=temp, x2=nitro, cx1=scale(temp,scale=F), cx2=scale(nitro,scale=F))

$$ y_i = \alpha + \beta_1 x_{1i} + \beta_2 x_{2i} + \varepsilon_i \hspace{4em}\varepsilon_i\sim{}\mathcal{N}(0, \sigma^2) $$

Frequentist outcome

#additive model
summary(lm(y~cx1+cx2, data=data))

Call:
lm(formula = y ~ cx1 + cx2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.45927 -0.78738 -0.00688  0.71051  2.88492 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.8234     0.1131  33.793  < 2e-16 ***
cx1           3.0250     0.4986   6.067 2.52e-08 ***
cx2           1.3878     0.4281   3.242  0.00163 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.131 on 97 degrees of freedom
Multiple R-squared:  0.5361,	Adjusted R-squared:  0.5265 
F-statistic: 56.05 on 2 and 97 DF,  p-value: < 2.2e-16

confint(lm(y~cx1+cx2, data=data))

                2.5 %   97.5 %
(Intercept) 3.5988464 4.047962
cx1         2.0353869 4.014675
cx2         0.5381745 2.237488

#multiplicative model
summary(lm(y~cx1*cx2, data=data))

Call:
lm(formula = y ~ cx1 * cx2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34877 -0.85435  0.06905  0.71265  2.57068 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.6710     0.1315  27.924  < 2e-16 ***
cx1           2.9292     0.4914   5.961 4.15e-08 ***
cx2           1.3445     0.4207   3.196  0.00189 ** 
cx1:cx2       2.6651     1.2305   2.166  0.03281 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.111 on 96 degrees of freedom
Multiple R-squared:  0.5577,	Adjusted R-squared:  0.5439 
F-statistic: 40.35 on 3 and 96 DF,  p-value: < 2.2e-16

confint(lm(y~cx1*cx2, data=data))

                2.5 %   97.5 %
(Intercept) 3.4100621 3.931972
cx1         1.9537887 3.904642
cx2         0.5094571 2.179451
cx1:cx2     0.2224680 5.107650

Bayesian outcome

#additive model
modelString="
model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mu[i],tau)
      mu[i] <- beta0+beta1*x1[i]+beta2*x2[i]
   }
   #Priors
   beta0 ~ dnorm(0.01,1.0E-6)
   beta1 ~ dnorm(0,1.0E-6)
   beta2 ~ dnorm(0,1.0E-6)
   tau <- 1 / (sigma * sigma)
   sigma~dunif(0,100)
 }
"
data.list <- with(data, list(y=y, x1=cx1, x2=cx2, n=nrow(data)))
data.jags <- jags(data=data.list,
           inits=NULL,
           parameters.to.save=c('alpha','beta1', 'beta2', 'sigma'),
           model.file=textConnection(modelString),
           n.chains=3,
           n.iter=100000,
           n.burnin=20000,
                   n.thin=100
           )

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 612

Initializing model

print(data.jags)

Inference for Bugs model at "6", fit using jags,
 3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 100
 n.sims = 2400 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta1      3.022   0.504   2.013   2.684   3.006   3.360   4.008 1.001  2400
beta2      1.388   0.435   0.498   1.113   1.403   1.689   2.199 1.001  2400
sigma      1.149   0.082   1.002   1.091   1.144   1.202   1.325 1.000  2400
deviance 309.506   2.931 305.926 307.384 308.842 310.830 317.254 1.001  2400

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.3 and DIC = 313.8
DIC is an estimate of expected predictive error (lower deviance is better).

#multiplicative model
modelString="
model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mu[i],tau)
      mu[i] <- beta0+beta1*x1[i]+beta2*x2[i]+beta3*x1[i]*x2[i]
   }
   #Priors
   beta0 ~ dnorm(0.01,1.0E-6)
   beta1 ~ dnorm(0,1.0E-6)
   beta2 ~ dnorm(0,1.0E-6)
   beta3 ~ dnorm(0,1.0E-6)
   tau <- 1 / (sigma * sigma)
   sigma~dunif(0,100)
 }
"
data.list <- with(data, list(y=y, x1=cx1, x2=cx2, n=nrow(data)))
data.jags <- jags(data=data.list,
           inits=NULL,
           parameters.to.save=c('alpha','beta1', 'beta2', 'sigma'),
           model.file=textConnection(modelString),
           n.chains=3,
           n.iter=100000,
           n.burnin=20000,
                   n.thin=100
           )

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 713

Initializing model

print(data.jags)

Inference for Bugs model at "7", fit using jags,
 3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 100
 n.sims = 2400 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta1      2.949   0.504   1.971   2.608   2.951   3.284   3.939 1.001  2300
beta2      1.333   0.425   0.484   1.041   1.333   1.618   2.166 1.001  2400
sigma      1.127   0.084   0.977   1.070   1.121   1.181   1.313 1.001  2400
deviance 305.835   3.304 301.533 303.456 305.062 307.486 313.832 1.001  2400

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 5.5 and DIC = 311.3
DIC is an estimate of expected predictive error (lower deviance is better).

#additive model
data.inla.a <- inla(y~cx1+cx2, data=data, control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))
#multiplicative model
data.inla.m <- inla(y~cx1*cx2, data=data, control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))

We will first of all explore the INLA summaries for the two models, paying particular attention to the KLD, Expected number of effective parameters and number of equivalent replicates estimates.

summary(data.inla.a)

Call:
c("inla(formula = y ~ cx1 + cx2, data = data, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0643          0.0534          0.0237          0.1415 

Fixed effects:
              mean     sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 3.8234 0.1125     3.6022   3.8234     4.0443 3.8234   0
cx1         3.0245 0.4957     2.0500   3.0245     3.9980 3.0245   0
cx2         1.3880 0.4256     0.5513   1.3880     2.2238 1.3880   0

The model has no random effects

Model hyperparameters:
                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.7968 0.1133     0.5935   0.7905      1.037 0.7797

Expected number of effective parameters(std dev): 3.001(2e-04)
Number of equivalent replicates : 33.32 

Deviance Information Criterion (DIC) ...: 312.74
Effective number of parameters .........: 3.647

Watanabe-Akaike information criterion (WAIC) ...: 312.77
Effective number of parameters .................: 3.546

Marginal log-Likelihood:  -173.86 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

summary(data.inla.m)

Call:
c("inla(formula = y ~ cx1 * cx2, data = data, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0658          0.0525          0.0239          0.1423 

Fixed effects:
              mean     sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 3.6712 0.1307     3.4143   3.6712     3.9279 3.6712   0
cx1         2.9288 0.4884     1.9685   2.9288     3.8882 2.9288   0
cx2         1.3446 0.4181     0.5226   1.3446     2.1659 1.3446   0
cx1:cx2     2.6613 1.2224     0.2579   2.6613     5.0621 2.6613   0

The model has no random effects

Model hyperparameters:
                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.8273 0.1182     0.6153   0.8208      1.078 0.8094

Expected number of effective parameters(std dev): 3.999(4e-04)
Number of equivalent replicates : 25.00 

Deviance Information Criterion (DIC) ...: 309.98
Effective number of parameters .........: 4.648

Watanabe-Akaike information criterion (WAIC) ...: 309.77
Effective number of parameters .................: 4.227

Marginal log-Likelihood:  -174.76 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

KLD values for all parameters for both models are 0, and thus the Simplified Laplace Approximation is appropriate. The number of effective parameters for both models is approximately 3. Not surprisingly, the additive model therefore has more equivalent replicates per parameter than the multiplicative model. In either case, there appears to be ample replicates per parameter to ensure reliable parameter estimation.

The multiplicative model has evidence of an interaction (95% credibility interval for the interaction parameter do not overlap zero) and both DIC and WAIC values are lower for the multiplicative model than the additive model. Consequently, the multiplicative model would be considered the better of the two models.

We will examine CPO and PIT from both models.

sum(data.inla.a$cpo$cpo)

[1] 24.45695

-mean(log(data.inla.a$cpo$cpo))

[1] 1.563885

plot(data.inla.a, plot.fixed.effects=FALSE, plot.lincomb=FALSE, plot.random.effects=FALSE,
     plot.hyperparameters=FALSE, plot.predictor=FALSE, plot.q=FALSE, plot.cpo=TRUE,
     single=FALSE)

sum(data.inla.m$cpo$cpo)

[1] 24.4262

-mean(log(data.inla.m$cpo$cpo))

[1] 1.54899

plot(data.inla.m, plot.fixed.effects=FALSE, plot.lincomb=FALSE, plot.random.effects=FALSE,
     plot.hyperparameters=FALSE, plot.predictor=FALSE, plot.q=FALSE, plot.cpo=TRUE,
     single=FALSE)

Predictions

Lets produce partial (conditional) plots for each of the main effects, at a couple of levels of the other covariates (since there was evidence of an interaction). I will do this via the imputation method, however, it could equally be performed via the linear combinations method.

#partial effect of cx1
newdata1 <- expand.grid(cx1=seq(min(data$cx1, na.rm=TRUE), max(data$cx1, na.rm=TRUE), len=100),
                        cx2=c(mean(data$cx2, na.rm=TRUE)-sd(data$cx2, na.rm=TRUE),
                              mean(data$cx2, na.rm=TRUE),
                              mean(data$cx2, na.rm=TRUE)+sd(data$cx2, na.rm=TRUE)),
                        y=NA)

#partial residuals of cx1
newdata1r <- expand.grid(cx1=data$cx1,
                         cx2=c(mean(data$cx2, na.rm=TRUE)-sd(data$cx2, na.rm=TRUE),
                               mean(data$cx2, na.rm=TRUE),
                               mean(data$cx2, na.rm=TRUE)+sd(data$cx2, na.rm=TRUE)),
                         y=NA)

#partial effect of cx2
newdata2 <- expand.grid(cx2=seq(min(data$cx2, na.rm=TRUE), max(data$cx2, na.rm=TRUE), len=100),
                        cx1=c(mean(data$cx1, na.rm=TRUE)-sd(data$cx1, na.rm=TRUE),
                              mean(data$cx1, na.rm=TRUE),
                              mean(data$cx1, na.rm=TRUE)+sd(data$cx1, na.rm=TRUE)),
                        y=NA)

#partial residuals of cx2
newdata2r <- expand.grid(cx2=data$cx2,
                         cx1=c(mean(data$cx1, na.rm=TRUE)-sd(data$cx1, na.rm=TRUE),
                               mean(data$cx1, na.rm=TRUE),
                               mean(data$cx1, na.rm=TRUE)+sd(data$cx1, na.rm=TRUE)),
                         y=NA)

newdata <- rbind(data.frame(Pred='cx1', newdata1),data.frame(Pred='cx2', newdata2),
                 data.frame(Pred='cx1r', newdata1r),data.frame(Pred='cx2r', newdata2r))
data.pred <- rbind(data.frame(Pred='Orig',subset(data, select=c(cx1,cx2,y))), newdata)
#fit the model
data.inla.m1 <- inla(y~cx1*cx2, data=data.pred, control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
                     control.predictor=list(compute=TRUE))

#extract the predictor values associated with the appended data
newdata <- cbind(newdata, data.inla.m1$summary.linear.predictor[(nrow(data)+1):nrow(data.pred),])
newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))
newdata1 <- droplevels(subset(newdata, Pred=='cx1'))
newdata1$Cov <- factor(newdata1$cx2, labels=c('Low','Med','High'))
newdata2 <- droplevels(subset(newdata, Pred=='cx2'))
newdata2$Cov <- factor(newdata2$cx1, labels=c('Low','Med','High'))

newdata1r <- droplevels(subset(newdata, Pred=='cx1r'))
newdata1r$Cov <- factor(newdata1$cx2, labels=c('Low','Med','High'))
newdata1r$y <- newdata1r$mean + (newdata1r$mean-data$y)
newdata2r <- droplevels(subset(newdata, Pred=='cx2r'))
newdata2r$Cov <- factor(newdata2$cx1, labels=c('Low','Med','High'))
newdata2r$y <- newdata2r$mean + (newdata2r$mean-data$y)

g1<-ggplot(newdata1, aes(y=mean, x=cx1, shape=Cov,fill=Cov, color=Cov)) +
       geom_point(data=newdata1r, aes(y=y)) +
       geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, color=NA) +
           scale_fill_discrete('cx2')+scale_color_discrete('cx2')+scale_shape_discrete('cx2')+
       geom_line() + theme_classic()
g2<-ggplot(newdata2, aes(y=mean, x=cx2, shape=Cov,fill=Cov, color=Cov)) +
       geom_point(data=newdata2r, aes(y=y)) +
       geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, color=NA) +
           scale_fill_discrete('cx1')+scale_color_discrete('cx1')+scale_shape_discrete('cx1')+
       geom_line() + theme_classic()
library(gridExtra)
grid.arrange(g1,g2, nrow=1)

Single factor ANOVA

Imagine we has designed an experiment in which we had measured the response ($y$) under five different treatments ($x$), each replicated 10 times ($n=10$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size per treatment=10
the categorical $x$ variable with 5 levels
the population means associated with the 5 treatment levels are 40, 45, 55, 40, 35 ($\alpha_1=40$, $\alpha_2=45$, $\alpha_3=55$, $\alpha_4=40$, $\alpha_5=30$)
the data are drawn from normal distributions with a mean of 0 and standard deviation of 3 ($\sigma^2=9$)

set.seed(1)
ngroups <- 5 #number of populations
nsample <- 10 #number of reps in each
pop.means <- c(40, 45, 55, 40, 30) #population mean length
sigma <- 3 #residual standard deviation
n <- ngroups * nsample #total sample size
eps <- rnorm(n, 0, sigma) #residuals
x <- gl(ngroups, nsample, n, lab=LETTERS[1:5]) #factor
means <- rep(pop.means, rep(nsample, ngroups))
X <- model.matrix(~x-1) #create a design matrix
y <- as.numeric(X %*% pop.means+eps)
data <- data.frame(y,x)
head(data)  #print out the first six rows of the data set

         y x
1 38.12064 A
2 40.55093 A
3 37.49311 A
4 44.78584 A
5 40.98852 A
6 37.53859 A

$$ y_i = \alpha + \beta X_{i} + \varepsilon_i \hspace{4em}\varepsilon_i\sim{}\mathcal{N}(0, \sigma^2) $$

Frequentist outcome

data.lm <- lm(y~x, data=data)
summary(data.lm)

Call:
lm(formula = y ~ x, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.3906 -1.2752  0.3278  1.7931  4.3892 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 40.39661    0.81333  49.668  < 2e-16 ***
xB           5.34993    1.15023   4.651 2.91e-05 ***
xC          14.20237    1.15023  12.347 4.74e-16 ***
xD          -0.03442    1.15023  -0.030    0.976    
xE          -9.99420    1.15023  -8.689 3.50e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.572 on 45 degrees of freedom
Multiple R-squared:  0.9129,	Adjusted R-squared:  0.9052 
F-statistic: 117.9 on 4 and 45 DF,  p-value: < 2.2e-16

Bayesian (JAGS) outcome

modelString="
model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mean[i],tau)
      mean[i] <- inprod(beta[],X[i,])
   }
   #Priors
   for (i in 1:ngroups) {
     beta[i] ~ dnorm(0, 1.0E-6) 
   }
   sigma ~ dunif(0, 100)
   tau <- 1 / (sigma * sigma)


   #Contrasts
   Group.means[1] <- beta[1]
   for (i in 2:ngroups) {
     Group.means[i] <- beta[i]+beta[1]
   }

   #Pairwise effects
   for (i in 1:ngroups) {
     for (j in 1:ngroups) {
	   eff[i,j] <- Group.means[i] - Group.means[j]
	 }
   }

   #Specific comparisons
   Comp1 <- Group.means[3] - Group.means[5]
   Comp2 <- (Group.means[1] + Group.means[2])/2 -
            (Group.means[3] + Group.means[4] + Group.means[5])/3
 }
"
X <- model.matrix(~x, data)
data.list <- with(data,
        list(y=y,
                 X=X,
         n=nrow(data),
         ngroups=ncol(X)
         )
)
data.jags <- jags(data=data.list,
          inits=NULL,
          parameters.to.save=c('beta','sigma', 'eff', 'Comp1', 'Comp2'),
          model.file=textConnection(modelString),
          n.chains=3,
          n.iter=10000,
          n.burnin=2000,
      n.thin=100
          )

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 406

Initializing model

print(data.jags)

Inference for Bugs model at "6", fit using jags,
 3 chains, each with 10000 iterations (first 2000 discarded), n.thin = 100
 n.sims = 240 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
Comp1     24.082   1.190  21.973  23.314  24.100  24.837  26.545 1.006   210
Comp2      1.310   0.798  -0.236   0.771   1.382   1.809   2.765 1.001   240
beta[1]   40.440   0.844  38.720  39.847  40.472  40.974  42.135 1.011   140
beta[2]    5.326   1.222   2.887   4.578   5.278   6.102   7.821 1.021   120
beta[3]   14.133   1.224  11.678  13.358  14.106  14.953  16.338 0.999   240
beta[4]   -0.124   1.155  -2.139  -0.883  -0.121   0.614   2.279 1.031   120
beta[5]   -9.949   1.142 -11.916 -10.737  -9.991  -9.144  -7.772 1.005   240
eff[1,1]   0.000   0.000   0.000   0.000   0.000   0.000   0.000 1.000     1
eff[2,1]   5.326   1.222   2.887   4.578   5.278   6.102   7.821 1.021   120
eff[3,1]  14.133   1.224  11.678  13.358  14.106  14.953  16.338 0.999   240
eff[4,1]  -0.124   1.155  -2.139  -0.883  -0.121   0.614   2.279 1.031   120
eff[5,1]  -9.949   1.142 -11.916 -10.737  -9.991  -9.144  -7.772 1.005   240
eff[1,2]  -5.326   1.222  -7.821  -6.102  -5.278  -4.578  -2.887 1.010   140
eff[2,2]   0.000   0.000   0.000   0.000   0.000   0.000   0.000 1.000     1
eff[3,2]   8.806   1.343   6.244   7.881   8.764   9.745  11.556 1.024   140
eff[4,2]  -5.450   1.215  -7.740  -6.381  -5.380  -4.673  -3.025 0.996   240
eff[5,2] -15.276   1.222 -17.554 -15.996 -15.369 -14.446 -12.564 0.996   240
eff[1,3] -14.133   1.224 -16.338 -14.953 -14.106 -13.358 -11.678 0.999   240
eff[2,3]  -8.806   1.343 -11.556  -9.745  -8.764  -7.881  -6.244 1.016   140
eff[3,3]   0.000   0.000   0.000   0.000   0.000   0.000   0.000 1.000     1
eff[4,3] -14.257   1.168 -16.241 -15.121 -14.245 -13.474 -11.660 1.018    98
eff[5,3] -24.082   1.190 -26.545 -24.837 -24.100 -23.314 -21.973 1.006   210
eff[1,4]   0.124   1.155  -2.279  -0.614   0.121   0.883   2.139 1.031   120
eff[2,4]   5.450   1.215   3.025   4.673   5.380   6.381   7.740 0.996   240
eff[3,4]  14.257   1.168  11.660  13.474  14.245  15.121  16.241 1.020    94
eff[4,4]   0.000   0.000   0.000   0.000   0.000   0.000   0.000 1.000     1
eff[5,4]  -9.825   1.220 -12.114 -10.643  -9.854  -8.901  -7.713 1.001   240
eff[1,5]   9.949   1.142   7.772   9.144   9.991  10.737  11.916 1.005   240
eff[2,5]  15.276   1.222  12.564  14.446  15.369  15.996  17.554 0.996   240
eff[3,5]  24.082   1.190  21.973  23.314  24.100  24.837  26.545 1.006   210
eff[4,5]   9.825   1.220   7.713   8.901   9.854  10.643  12.114 0.999   240
eff[5,5]   0.000   0.000   0.000   0.000   0.000   0.000   0.000 1.000     1
sigma      2.655   0.298   2.153   2.462   2.622   2.823   3.228 0.998   240
deviance 237.918   3.727 232.555 235.288 237.245 239.878 246.813 1.010   170

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 6.9 and DIC = 244.8
DIC is an estimate of expected predictive error (lower deviance is better).

In preparation for fitting this model, we will also consider a number of contrasts (as we did for the Bayesian (JAGS) version of the analysis. Firstly, we will append missing response data in order to generate cell means predictions (for the summary plot) and secondly, we will define linear combinations (contrasts) to provide the pairwise and planned comparisons.

newdata <- data.frame(x=levels(data$x), y=NA)
data.pred <- rbind(data,  newdata)
#Define linear combinations (pairwise comparisons)
library(multcomp)
tuk.mat <- contrMat(n=table(newdata$x), type="Tukey")
Xmat <- model.matrix(~x, data=newdata)
pairwise.Xmat <- tuk.mat %*% Xmat
pairwise.Xmat

      (Intercept) xB xC xD xE
B - A           0  1  0  0  0
C - A           0  0  1  0  0
D - A           0  0  0  1  0
E - A           0  0  0  0  1
C - B           0 -1  1  0  0
D - B           0 -1  0  1  0
E - B           0 -1  0  0  1
D - C           0  0 -1  1  0
E - C           0  0 -1  0  1
E - D           0  0  0 -1  1

#Define linear combinations (planned contrasts)
comp.Xmat <- rbind(Comp1=c(0,0,1,0,-1), Comp2=c(1/2,1/2,-1/3,-1/3,-1/3))
#make these appropriate for the treatment contrasts applied in the model fit
comp.Xmat <-cbind(0,comp.Xmat %*% contr.treatment(5))
rownames(comp.Xmat) <- c("Grp 3 vs 5","Grp 1,2 vs 3,4,5")

Xmat <- rbind(pairwise.Xmat, comp.Xmat)
Xmat

                 (Intercept)   xB         xC         xD         xE
B - A                      0  1.0  0.0000000  0.0000000  0.0000000
C - A                      0  0.0  1.0000000  0.0000000  0.0000000
D - A                      0  0.0  0.0000000  1.0000000  0.0000000
E - A                      0  0.0  0.0000000  0.0000000  1.0000000
C - B                      0 -1.0  1.0000000  0.0000000  0.0000000
D - B                      0 -1.0  0.0000000  1.0000000  0.0000000
E - B                      0 -1.0  0.0000000  0.0000000  1.0000000
D - C                      0  0.0 -1.0000000  1.0000000  0.0000000
E - C                      0  0.0 -1.0000000  0.0000000  1.0000000
E - D                      0  0.0  0.0000000 -1.0000000  1.0000000
Grp 3 vs 5                 0  0.0  1.0000000  0.0000000 -1.0000000
Grp 1,2 vs 3,4,5           0  0.5 -0.3333333 -0.3333333 -0.3333333

lincomb <- inla.make.lincombs(as.data.frame(Xmat))

Now lets fit the model.

#fit the model
data.inla1 <- inla(y~x, lincomb=lincomb,  #include the linear combinations
   data=data.pred,
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
   control.inla=list(lincomb.derived.only=TRUE))
#examine the regular summary - not there are no changes to the first fit
summary(data.inla1)

Call:
c("inla(formula = y ~ x, data = data.pred, lincomb = lincomb, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE), control.inla = list(lincomb.derived.only = TRUE))" )

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.1363          0.0730          0.0552          0.2646 

Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) 40.4027 0.8021    38.8235  40.4026    41.9806 40.4025   0
xB           5.3404 1.1346     3.1059   5.3405     7.5715  5.3407   0
xC          14.1871 1.1346    11.9524  14.1873    16.4180 14.1877   0
xD          -0.0405 1.1346    -2.2748  -0.0405     2.1908 -0.0403   0
xE          -9.9939 1.1346   -12.2279  -9.9939    -7.7624 -9.9939   0

Linear combinations (derived):
     ID     mean     sd 0.025quant 0.5quant 0.975quant     mode kld
lc01  1   5.3404 1.1346     3.1059   5.3405     7.5715   5.3407   0
lc02  2  14.1871 1.1346    11.9524  14.1873    16.4180  14.1877   0
lc03  3  -0.0405 1.1346    -2.2748  -0.0405     2.1908  -0.0403   0
lc04  4  -9.9939 1.1346   -12.2279  -9.9939    -7.7624  -9.9939   0
lc05  5   8.8467 1.1355     6.6106   8.8468    11.0799   8.8470   0
lc06  6  -5.3809 1.1355    -7.6166  -5.3810    -3.1474  -5.3810   0
lc07  7 -15.3342 1.1355   -17.5697 -15.3344   -13.1005 -15.3346   0
lc08  8 -14.2276 1.1355   -16.4632 -14.2278   -11.9939 -14.2280   0
lc09  9 -24.1810 1.1355   -26.4163 -24.1812   -21.9471 -24.1816   0
lc10 10  -9.9534 1.1355   -12.1890  -9.9535    -7.7198  -9.9536   0
lc11 11  24.1810 1.1355    21.9444  24.1812    26.4138  24.1816   0
lc12 12   1.2859 0.7326    -0.1565   1.2859     2.7269   1.2859   0

The model has no random effects

Model hyperparameters:
                                          mean     sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.1577 0.0325     0.1018   0.1551      0.229 0.1506

Expected number of effective parameters(std dev): 4.995(9e-04)
Number of equivalent replicates : 10.01 

Deviance Information Criterion (DIC) ...: 242.49
Effective number of parameters .........: 5.651

Watanabe-Akaike information criterion (WAIC) ...: 243.15
Effective number of parameters .................: 5.727

Marginal log-Likelihood:  -142.21 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

Parameter estimates are entirely consistent with the Frequentist and JAGS outcomes. Lets explore the various contrasts. These are captured in the summary.lincomb.derived element of the INLA list. To make this a little more informative (so we can see what each contrast represents, I will apply the rownames from the Xmat.

rownames(data.inla1$summary.lincomb.derived) <- rownames(Xmat)
data.inla1$summary.lincomb.derived

                 ID         mean        sd  0.025quant     0.5quant 0.975quant         mode kld
B - A             1   5.34035821 1.1345555   3.1059307   5.34045527   7.571501   5.34073924   0
C - A             2  14.18709661 1.1345572  11.9524362  14.18727158  16.418038  14.18770537   0
D - A             3  -0.04051494 1.1345548  -2.2748016  -0.04046525   2.190752  -0.04027242   0
E - A             4  -9.99387467 1.1345544 -12.2279022  -9.99391263  -7.762378  -9.99388837   0
C - B             5   8.84673840 1.1354933   6.6105973   8.84678312  11.079883   8.84696648   0
D - B             6  -5.38087314 1.1354931  -7.6166440  -5.38095391  -3.147399  -5.38101187   0
E - B             7 -15.33423287 1.1354941 -17.5697471 -15.33440142 -13.100526 -15.33462821   0
D - C             8 -14.22761155 1.1354940 -16.4631542 -14.22777033 -11.993931 -14.22797835   0
E - C             9 -24.18097128 1.1354959 -26.4162589 -24.18121785 -21.947056 -24.18159469   0
E - D            10  -9.95335973 1.1354934 -12.1890125  -9.95348082  -7.719779  -9.95361634   0
Grp 3 vs 5       11  24.18097128 1.1354959  21.9444267  24.18115124  26.413765  24.18159469   0
Grp 1,2 vs 3,4,5 12   1.28594339 0.7325942  -0.1565176   1.28589156   2.726922   1.28585466   0

Finally, it is time to extract the predicted values from the linear predictor and plot the cell means.

#summary figure
newdata <- cbind(newdata, data.inla1$summary.linear.predictor[(nrow(data)+1):nrow(data.pred),])
newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))

ggplot(newdata, aes(y=mean, x=x)) + geom_linerange(aes(ymin=lower, ymax=upper))+
       geom_point() + theme_classic()

Factorial ANOVA

Imagine we has designed an experiment in which we had measured the response ($y$) under a combination of two different potential influences (Factor A: levels $a1$ and $a2$; and Factor B: levels $b1$, $b2$ and $b3$), each combination replicated 10 times ($n=10$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size per treatment=10
factor A with 2 levels
factor B with 3 levels
the 6 effects parameters are 40, 15, 5, 0, -15,10 ($\mu_{a1b1}=40$, $\mu_{a2b1}=40+15=55$, $\mu_{a1b2}=40+5=45$, $\mu_{a1b3}=40+0=40$, $\mu_{a2b2}=40+15+5-15=45$ and $\mu_{a2b3}=40+15+0+10=65$)
the data are drawn from normal distributions with a mean of 0 and standard deviation of 3 ($\sigma^2=9$)

set.seed(1)
nA <- 2 #number of levels of A
nB <- 3 #number of levels of B
nsample <- 10 #number of reps in each
A <- gl(nA,1,nA, lab=paste("a",1:nA,sep=""))
B <- gl(nB,1,nB, lab=paste("b",1:nB,sep=""))
data <-expand.grid(A=A,B=B, n=1:nsample)
X <- model.matrix(~A*B, data=data)
eff <- c(40,15,5,0,-15,10)
sigma <- 3 #residual standard deviation
n <- nrow(data)
eps <- rnorm(n, 0, sigma) #residuals
data$y <- as.numeric(X %*% eff+eps)
head(data)  #print out the first six rows of the data set

   A  B n        y
1 a1 b1 1 38.12064
2 a2 b1 1 55.55093
3 a1 b2 1 42.49311
4 a2 b2 1 49.78584
5 a1 b3 1 40.98852
6 a2 b3 1 62.53859

with(data,interaction.plot(A,B,y))

## ALTERNATIVELY, we could supply the population means
## and get the effect parameters from these.
## To correspond to the model matrix, enter the population
## means in the order of:
## a1b1, a2b1, a1b1, a2b2,a1b3,a2b3
pop.means <- as.matrix(c(40,55,45,45,40,65),byrow=F)
## Generate a minimum model matrix for the effects
XX <-model.matrix(~A*B, expand.grid(A=factor(1:2),B=factor(1:3)))
## Use the solve() function to solve what are effectively simultaneous equations
(eff<-as.vector(solve(XX,pop.means)))

[1]  40  15   5   0 -15  10

data$y <- as.numeric(X %*% eff+eps)

$$ y_i = \alpha + \beta X_{i} + \varepsilon_i \hspace{4em}\varepsilon_i\sim{}\mathcal{N}(0, \sigma^2) $$

Frequentist outcome

data.lm <- lm(y~A*B, data=data)
summary(data.lm)

Call:
lm(formula = y ~ A * B, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.3944 -1.5753  0.2281  1.5575  5.1909 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  41.0988     0.8218  50.010  < 2e-16 ***
Aa2          14.6515     1.1622  12.606  < 2e-16 ***
Bb2           4.6386     1.1622   3.991   0.0002 ***
Bb3          -0.7522     1.1622  -0.647   0.5202    
Aa2:Bb2     -15.7183     1.6436  -9.563 3.24e-13 ***
Aa2:Bb3       9.3352     1.6436   5.680 5.54e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.599 on 54 degrees of freedom
Multiple R-squared:  0.9245,	Adjusted R-squared:  0.9175 
F-statistic: 132.3 on 5 and 54 DF,  p-value: < 2.2e-16

Bayesian (JAGS) outcome

modelString="
model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mean[i],tau)
      mean[i] <- inprod(beta[],X[i,])
   }
   #Priors and derivatives
   for (i in 1:p) {
     beta[i] ~ dnorm(0, 1.0E-6) #prior
   }
   sigma ~ dunif(0, 100)
   tau <- 1 / (sigma * sigma)
 }
"
X <- model.matrix(~A*B, data)
data.list <- with(data,
                  list(y=y,
                       X=X,
                       n=nrow(data),
                       p=ncol(X)
                       )
)

data.jags <- jags(data=data.list,
          inits=NULL,
          parameters.to.save=c('beta','sigma'),
          model.file=textConnection(modelString),
          n.chains=3,
          n.iter=10000,
          n.burnin=2000,
      n.thin=100
          )

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 502

Initializing model

print(data.jags)

Inference for Bugs model at "8", fit using jags,
 3 chains, each with 10000 iterations (first 2000 discarded), n.thin = 100
 n.sims = 240 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]   40.999   0.894  39.444  40.348  40.972  41.622  42.780 1.005   240
beta[2]   14.849   1.136  12.625  14.026  14.859  15.554  16.912 0.997   240
beta[3]    4.793   1.204   2.600   4.029   4.806   5.659   6.884 1.012   240
beta[4]   -0.646   1.214  -3.104  -1.375  -0.585   0.182   1.544 0.996   240
beta[5]  -16.021   1.739 -19.540 -16.976 -15.918 -14.945 -12.609 0.996   240
beta[6]    9.216   1.565   6.271   8.162   9.269  10.179  12.699 1.002   240
sigma      2.665   0.281   2.238   2.446   2.650   2.837   3.309 1.000   240
deviance 286.159   3.935 280.494 283.270 285.659 288.017 293.962 1.017   210

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 7.7 and DIC = 293.9
DIC is an estimate of expected predictive error (lower deviance is better).

newdata <- expand.grid(A=levels(data$A), B=levels(data$B), y=NA)
data.pred <- rbind(subset(data, select=c(A,B,y)),  newdata)

In addition to fitting the model in INLA, it might be useful to also derive the marginal effects of one of the factors (A) at each level of the other factor (B).

#Define linear combinations (planned contrasts)
Xmat <- model.matrix(~A*B, data=newdata)
cbind(newdata, Xmat)

   A  B  y (Intercept) Aa2 Bb2 Bb3 Aa2:Bb2 Aa2:Bb3
1 a1 b1 NA           1   0   0   0       0       0
2 a2 b1 NA           1   1   0   0       0       0
3 a1 b2 NA           1   0   1   0       0       0
4 a2 b2 NA           1   1   1   0       1       0
5 a1 b3 NA           1   0   0   1       0       0
6 a2 b3 NA           1   1   0   1       0       1

library(plyr)
Xmat <- daply(cbind(newdata, Xmat), ~B, function(x) {
    x[2,] - x[1,]
})
Xmat

    
B    A  B  y  (Intercept) Aa2 Bb2 Bb3 Aa2:Bb2 Aa2:Bb3
  b1 NA NA NA 0           1   0   0   0       0      
  b2 NA NA NA 0           1   0   0   1       0      
  b3 NA NA NA 0           1   0   0   0       1

lincomb <- inla.make.lincombs(as.data.frame(Xmat[,-1:-3]))

Now lets fit the model.

#fit the model
data.inla <- inla(y~A*B,
   data=data.pred,
   lincomb=lincomb,
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
   control.inla=list(lincomb.derived.only=TRUE))
#examine the regular summary - not there are no changes to the first fit
summary(data.inla)

Call:
c("inla(formula = y ~ A * B, data = data.pred, lincomb = lincomb, ",  "    control.compute = list(dic = TRUE, cpo = TRUE, waic = TRUE), ",  "    control.inla = list(lincomb.derived.only = TRUE))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0820          0.0597          0.0309          0.1727 

Fixed effects:
                mean     sd 0.025quant 0.5quant 0.975quant     mode kld
(Intercept)  41.1152 0.8119    39.5179  41.1150    42.7118  41.1147   0
Aa2          14.6213 1.1474    12.3621  14.6216    16.8761  14.6223   0
Bb2           4.6088 1.1479     2.3485   4.6091     6.8648   4.6098   0
Bb3          -0.7619 1.1479    -3.0216  -0.7619     1.4946  -0.7616   0
Aa2:Bb2     -15.6643 1.6225   -18.8558 -15.6650   -12.4729 -15.6661   0
Aa2:Bb3       9.3525 1.6225     6.1598   9.3523    12.5428   9.3519   0

Linear combinations (derived):
    ID    mean     sd 0.025quant 0.5quant 0.975quant    mode kld
lc1  1 14.6213 1.1474    12.3621  14.6216    16.8761 14.6223   0
lc2  2 -1.0430 1.1491    -3.3039  -1.0434     1.2168 -1.0438   0
lc3  3 23.9738 1.1491    21.7117  23.9739    26.2326 23.9742   0

The model has no random effects

Model hyperparameters:
                                          mean    sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 0.1534 0.029      0.103   0.1513     0.2164 0.1476

Expected number of effective parameters(std dev): 5.991(0.0014)
Number of equivalent replicates : 10.02 

Deviance Information Criterion (DIC) ...: 292.00
Effective number of parameters .........: 6.649

Watanabe-Akaike information criterion (WAIC) ...: 292.64
Effective number of parameters .................: 6.634

Marginal log-Likelihood:  -169.80 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

Finally, it is time to extract the predicted values from the linear predictor and plot the cell means.

#summary figure
newdata <- cbind(newdata, data.inla$summary.linear.predictor[(nrow(data)+1):nrow(data.pred),])
newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))
head(newdata)

              A  B  y     mean        sd    lower 0.5quant    upper     mode          kld
predictor.61 a1 b1 NA 41.11522 0.8118338 39.52008 41.11503 42.71099 41.11468 1.674684e-09
predictor.62 a2 b1 NA 55.73648 0.8123780 54.13916 55.73664 57.33253 55.73696 1.885519e-09
predictor.63 a1 b2 NA 45.72401 0.8126511 44.12616 45.72417 47.32061 45.72447 1.878400e-09
predictor.64 a2 b2 NA 44.68097 0.8129257 43.08342 44.68085 46.27874 44.68063 1.702529e-09
predictor.65 a1 b3 NA 40.35329 0.8126503 38.75616 40.35321 41.95042 40.35307 1.732000e-09
predictor.66 a2 b3 NA 64.32707 0.8129237 62.72894 64.32714 65.92440 64.32729 1.822437e-09

ggplot(newdata, aes(y=mean, x=B, shape=A)) + geom_linerange(aes(ymin=lower, ymax=upper))+
       geom_point() + geom_line(aes(x=as.numeric(B), linetype=A)) + theme_classic()+
       theme(legend.position=c(0,1),legend.justification=c(0,1))

Logistic regression (binary response)

Lets say we wanted to model the presence/absence of an item ($y$) against a continuous predictor ($x$) As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size = 20
the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
the rate of change in log odds ratio per unit change in x (slope) = 0.5. The magnitude of the slope is an indicator of how abruptly the log odds ratio changes. A very abrupt change would occur if there was very little variability in the trend. That is, a large slope would be indicative of a threshold effect. A lower slope indicates a gradual change in odds ratio and thus a less obvious and precise switch between the likelihood of 1 vs 0.
the inflection point (point where the slope of the line is greatest) = 10. The inflection point indicates the value of the predictor ($x$) where the log Odds are 50% (switching between 1 being more likely and 0 being more likely). This is also known as the LD50.
the intercept (which has no real interpretation) is equal to the negative of the slope multiplied by the inflection point
to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor. These expected values are then transformed into a scale mapped by (0,1) by using the inverse logit function $\frac{e^{linear~predictor}}{1+e^{linear~predictor}}$
finally, we generate $y$ values by using the expected $y$ values as probabilities when drawing random numbers from a binomial distribution. This step adds random noise to the expected $y$ values and returns only 1's and 0's.

set.seed(8)
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
#The slope is the rate of change in log odds ratio for each unit change in x
# the smaller the slope, the slower the change (more variability in data too)
slope=0.5
#Inflection point is where the slope of the line is greatest
#this is also the LD50 point
inflect <- 10
#Intercept (no interpretation)
intercept <- -1*(slope*inflect)
#The linear predictor
linpred <- intercept+slope*x
#Predicted y values
y.pred <- exp(linpred)/(1+exp(linpred))
#Add some noise and make binomial
n.y <-rbinom(n=n.x,20,p=0.9)
y<- rbinom(n = n.x,size=1, prob = y.pred)
dat <- data.frame(y,x)

$$ log\left(\frac{\pi}{1-\pi}\right)=\beta_0+\beta_1x_1+\epsilon_i, \hspace{1cm}\epsilon\sim Binom(\lambda) $$

Frequentist outcome

dat.glm <- glm(y~x, data=dat, family="binomial")
summary(dat.glm)

Call:
glm(formula = y ~ x, family = "binomial", data = dat)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.97157  -0.33665  -0.08191   0.30035   1.59628  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -6.9899     3.1599  -2.212   0.0270 *
x             0.6559     0.2936   2.234   0.0255 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 27.526  on 19  degrees of freedom
Residual deviance: 11.651  on 18  degrees of freedom
AIC: 15.651

Number of Fisher Scoring iterations: 6

Bayesian (JAGS) outcome

modelString="
model{
  for (i in 1:N) {
    y[i] ~ dbern(p[i])
    logit(p[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
}
"
dat.list <- with(dat, list(y=y, x=x, N=nrow(dat)))
dat.jags <- jags(data=dat.list,model.file=textConnection(modelString),
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 147

Initializing model

print(dat.jags)

Inference for Bugs model at "9", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 100
 n.sims = 300 iterations saved
         mu.vect sd.vect    2.5%     25%    50%    75%  97.5%  Rhat n.eff
beta0     -9.497   3.970 -18.493 -12.068 -8.922 -6.449 -3.740 1.004   300
beta1      0.898   0.370   0.353   0.612  0.844  1.123  1.772 1.003   300
deviance  13.872   2.108  11.683  12.301 13.291 14.811 19.290 1.018   200

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.2 and DIC = 16.1
DIC is an estimate of expected predictive error (lower deviance is better).

Bayesian (STAN) outcome

library(rstan)
modelString="
data {
   int<lower=1> n;
   int<lower=1> nX;
   int y[n];
   matrix [n,nX] X;
}
parameters {
   vector [nX] beta;
}
transformed parameters {
   vector [n] mu;
   mu<-X*beta;
}
model {
   beta ~ normal(0,1000);
   y ~ binomial_logit(1, mu);
}
"
Xmat <- model.matrix(~x,data=dat)
dat.list <- with(dat, list(y = y, X = Xmat, nX=ncol(Xmat),n = nrow(dat)))

library(rstan)
dat.rstan <- stan(data=dat.list,
           model_code=modelString,
           chains=3,
           iter=1000,
           warmup=500,
                   thin=2,
           save_dso=TRUE
           )

SAMPLING FOR MODEL '8378cae68bcb88c343bcb3d714aeb387' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.060304 seconds (Warm-up)
#                0.053233 seconds (Sampling)
#                0.113537 seconds (Total)
# 

SAMPLING FOR MODEL '8378cae68bcb88c343bcb3d714aeb387' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.078958 seconds (Warm-up)
#                0.046298 seconds (Sampling)
#                0.125256 seconds (Total)
# 

SAMPLING FOR MODEL '8378cae68bcb88c343bcb3d714aeb387' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.063185 seconds (Warm-up)
#                0.053685 seconds (Sampling)
#                0.11687 seconds (Total)
#

print(dat.rstan, pars=c('beta'))

Inference for Stan model: 8378cae68bcb88c343bcb3d714aeb387.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

         mean se_mean   sd   2.5%    25%   50%   75% 97.5% n_eff Rhat
beta[1] -9.56    0.28 3.86 -18.39 -11.73 -9.04 -6.74 -3.78   189    1
beta[2]  0.90    0.03 0.36   0.36   0.64  0.85  1.09  1.71   193    1

Samples were drawn using NUTS(diag_e) at Mon Feb 22 12:02:04 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Bayesian (BRM) outcome

library(brms)
dat.brm <- brm(y~x, data=dat, family='binomial',
   prior=c(set_prior('normal(0,1000)', class='b')),
   n.chains=3, n.iter=1000, warmup=500, n.thin=2
)

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.061118 seconds (Warm-up)
#                0.038603 seconds (Sampling)
#                0.099721 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.070109 seconds (Warm-up)
#                0.048011 seconds (Sampling)
#                0.11812 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.130434 seconds (Warm-up)
#                0.058031 seconds (Sampling)
#                0.188465 seconds (Total)
#

summary(dat.brm)

 Family: binomial (logit) 
Formula: y ~ x 
   Data: dat (Number of observations: 20) 
Samples: 3 chains, each with n.iter = 1000; n.warmup = 500; n.thin = 2; 
         total post-warmup samples = 750
   WAIC: 17.16
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -9.43      4.09   -18.32    -2.80        219    1
x             0.89      0.38     0.29     1.68        215    1

Samples were drawn using NUTS(diag_e). For each parameter, Eff.Sample is a 
crude measure of effective sample size, and Rhat is the potential scale 
reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
} 
data { 
  int<lower=1> N;  # number of observations 
  int Y[N];  # response variable 
  int<lower=1> K;  # number of fixed effects 
  matrix[N, K] X;  # FE design matrix 
  int trials;  # number of trials 
} 
transformed data { 
} 
parameters { 
  real b_Intercept;  # fixed effects Intercept 
  vector[K] b;  # fixed effects 
} 
transformed parameters { 
  vector[N] eta;  # linear predictor 
  # compute linear predictor 
  eta <- X * b + b_Intercept; 
} 
model { 
  # prior specifications 
  b_Intercept ~ normal(0,1000); 
  b ~ normal(0,1000); 
  # likelihood contribution 
  Y ~ binomial_logit(trials, eta); 
} 
generated quantities { 
}

fitted <- subset(dat, select=c(x,y))
fitted$y <- NA
newdata <- expand.grid(x=seq(min(dat$x), max(dat$x), len=100), y=NA)
dat.pred <- rbind(dat,fitted,newdata)

Now lets fit the model.

#fit the model
dat.inla <- inla(y~x, family='binomial',
   data=dat.pred,
   control.family=list(link='logit'),
   control.predictor=list(link=1, compute=TRUE),
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))
#examine the regular summary 
summary(dat.inla)

Call:
c("inla(formula = y ~ x, family = \"binomial\", data = dat.pred, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE), control.predictor = list(link = 1, ",  "    compute = TRUE), control.family = list(link = \"logit\"))" )

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0679          0.0406          0.0192          0.1277 

Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) -7.5996 3.1591   -14.6917  -7.2414    -2.3638 -6.4312   0
x            0.7134 0.2935     0.2284   0.6797     1.3727  0.6036   0

The model has no random effects

The model has no hyperparameters

Expected number of effective parameters(std dev): 2.00(0.00)
Number of equivalent replicates : 10.00 

Deviance Information Criterion (DIC) ...: 15.37
Effective number of parameters .........: 1.847

Watanabe-Akaike information criterion (WAIC) ...: 15.74
Effective number of parameters .................: 1.875

Marginal log-Likelihood:  -9.883 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

newdata <- cbind(newdata,
  dat.inla$summary.fitted.values[(nrow(dat)+nrow(fitted)+1):nrow(dat.pred),])
head(newdata)

                            x  y       mean         sd   0.025quant    0.5quant 0.975quant         mode
fitted.predictor.041 1.024733 NA 0.01285490 0.03964898 1.677056e-06 0.001438164  0.1066886 2.284295e-10
fitted.predictor.042 1.203403 NA 0.01350352 0.04040270 2.134741e-06 0.001623817  0.1109843 5.081955e-10
fitted.predictor.043 1.382074 NA 0.01419668 0.04119379 2.716943e-06 0.001833402  0.1154430 1.082068e-09
fitted.predictor.044 1.560745 NA 0.01493801 0.04202428 3.457403e-06 0.002069996  0.1200704 2.214625e-09
fitted.predictor.045 1.739415 NA 0.01573143 0.04289634 4.398962e-06 0.002337068  0.1248719 4.387127e-09
fitted.predictor.046 1.918086 NA 0.01658128 0.04381226 5.595989e-06 0.002638525  0.1298534 8.434306e-09

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))

#fitted <- cbind(fitted, 
#  dat.inla$summary.fitted.values[(nrow(dat)+1):(nrow(dat)+nrow(fitted)),])
#fitted <- reshape:::rename(fitted, c("0.025quant"="lower", "0.975quant"="upper"))

#fitted$partial.obs=fitted$mean + (fitted$mean - dat$y)

#ndata <- data.splt
#ndata$fit <- fitted$mean
#ndata$pred <- pred$mean
#ndata$Res <- (ndata$fit - data.splt$y)
#ndata$Pobs <- ndata$pred + ndata$Res
#head(ndata)
library(grid)
ggplot(newdata, aes(y=mean, x=x)) +
        geom_blank()+
        geom_point(data=dat, aes(y=y, x=x)) +
        geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
        geom_line() +
        theme_classic()+
        theme(legend.key.width=unit(2,'lines'), legend.position=c(0,1), legend.justification=c(0,1))

Grouped binomial regression (proportional counts)

Recall that the binomial distribution represents the density (probability) of all possible successes out of a total of N trials. Hence the binomial distribution is also a suitable error distribution for analysing grouped binary data. An example of group data might be the number of seeds that germinate out of a total of 10 seeds, or the number of female kangaroos with joeys out of a total of 120 female kangaroos.

For this demonstration, we will model the number of successes against a uniformly distributed predictor (x). The number of trials in each group (level of the predictor) will vary slightly (yet randomly) so as to mimic complications that inevitable occur in real experiments.

Details of data generation

Random data incorporating the following trends (effect parameters)

the number of levels of $x$ = 10
the continuous $x$ variable is a random uniform spread of measurements between 10 and 20
the rate of change in log odds ratio per unit change in x (slope) = -0.25. The magnitude of the slope is an indicator of how abruptly the log odds ratio changes. A very abrupt change would occur if there was very little variability in the trend. That is, a large slope would be indicative of a threshold effect. A lower slope indicates a gradual change in odds ratio and thus a less obvious and precise switch between the likelihood of 1 vs 0.
the inflection point (point where the slope of the line is greatest) = 10. The inflection point indicates the value of the predictor ($x$) where the log Odds are 50% (switching between 1 being more likely and 0 being more likely). This is also known as the LD50.
the intercept (which has no real interpretation) is equal to the negative of the slope multiplied by the inflection point
generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor.
generate random numbers of trials per group by drawing integers out of a binomial distribution with a total size of 20 and probability of 0.9. Hence the maximum number of trials per group will be 20, yet some will be less than that.
the number of success per group are drawn from a binomial distribution with a total size equal to the trial sizes generated in the previous step and probabilities equal to the expected $y$ values
finally, the number of failures per group are calculated as the difference between the total trial size and number of successes per group.

set.seed(8)
#The number of levels of x
n.x <- 10
#Create x values that at uniformly distributed throughout the rate of 10 to 20
x <- sort(runif(n = n.x, min = 10, max =20))
#The slope is the rate of change in log odds ratio for each unit change in x
# the smaller the slope, the slower the change (more variability in data too)
slope=-.25
#Inflection point is where the slope of the line is greatest
#this is also the LD50 point
inflect <- 15
#Intercept (no interpretation)
intercept <- -1*(slope*inflect)
#The linear predictor
linpred <- intercept+slope*x
#Predicted y values
y.pred <- exp(linpred)/(1+exp(linpred))
#Add some noise and make binary (0's and 1's)
n.trial <- rbinom(n=n.x,20, prob=0.9)
success <- rbinom(n = n.x, size = n.trial,prob = y.pred)
failure <- n.trial - success
dat <- data.frame(success,failure,x)

$$ log\left(\frac{\pi}{1-\pi}\right)=\beta_0+\beta_1x_1+\epsilon_i, \hspace{1cm}\epsilon\sim Binom(\lambda) $$

Frequentist outcome

dat<-within(dat, total<-success+failure)
dat.glm <- glm(cbind(success,failure)~x, data=dat, family="binomial", weight=total)
summary(dat.glm)

Call:
glm(formula = cbind(success, failure) ~ x, family = "binomial", 
    data = dat, weights = total)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-6.3073  -1.2530   0.5005   2.1157   5.8746  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   4.0472     0.2525   16.03   <2e-16 ***
x            -0.2591     0.0158  -16.40   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 394.60  on 9  degrees of freedom
Residual deviance: 105.33  on 8  degrees of freedom
AIC: 716.15

Number of Fisher Scoring iterations: 3

Bayesian (JAGS) outcome

dat.list <- with(dat, list(success=success, total=success+failure, x=x, N=nrow(dat)))
modelString="
model{
  for (i in 1:N) {
    success[i] ~ dbin(p[i],total[i])
    logit(p[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
}
"
writeLines(modelString, con='../downloads/BUGSscripts/tut11.4bS2.bug')
library(R2jags)
dat.logit.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS2.bug',
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=200000, n.burnin=100000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 87

Initializing model

print(dat.logit.jags)

Inference for Bugs model at "../downloads/BUGSscripts/tut11.4bS2.bug", fit using jags,
 3 chains, each with 2e+05 iterations (first 1e+05 discarded), n.thin = 100
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      4.008   1.086  1.892  3.305  3.972  4.753  6.148 1.002  1300
beta1     -0.256   0.068 -0.387 -0.302 -0.255 -0.210 -0.125 1.002  1300
deviance  40.637   1.984 38.680 39.218 40.011 41.478 45.896 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.0 and DIC = 42.6
DIC is an estimate of expected predictive error (lower deviance is better).

Bayesian (STAN) outcome

library(rstan)
modelString="
data {
   int<lower=1> n;
   int<lower=1> nX;
   int y[n];
   int nTrials[n];
   matrix [n,nX] X;
}
parameters {
   vector [nX] beta;
}
transformed parameters {
   vector [n] mu;
   mu<-X*beta;
}
model {
   beta ~ normal(0,1000);
   y ~ binomial_logit(nTrials, mu);
}
"
Xmat <- model.matrix(~x,data=dat)
dat.list <- with(dat, list(y = success, nTrials=success+failure, X = Xmat, nX=ncol(Xmat),n = nrow(dat)))

library(rstan)
dat.rstan <- stan(data=dat.list,
           model_code=modelString,
           chains=3,
           iter=1000,
           warmup=500,
                   thin=2,
           save_dso=TRUE
           )

SAMPLING FOR MODEL '6c0358c6c330110e714238536a9a5894' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.060866 seconds (Warm-up)
#                0.032433 seconds (Sampling)
#                0.093299 seconds (Total)
# 

SAMPLING FOR MODEL '6c0358c6c330110e714238536a9a5894' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.055367 seconds (Warm-up)
#                0.038589 seconds (Sampling)
#                0.093956 seconds (Total)
# 

SAMPLING FOR MODEL '6c0358c6c330110e714238536a9a5894' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.126566 seconds (Warm-up)
#                0.072445 seconds (Sampling)
#                0.199011 seconds (Total)
#

print(dat.rstan, pars=c('beta'))

Inference for Stan model: 6c0358c6c330110e714238536a9a5894.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1]  4.16    0.07 1.08  2.07  3.41  4.19  4.84  6.34   219 1.00
beta[2] -0.26    0.00 0.07 -0.39 -0.31 -0.27 -0.22 -0.13   217 1.01

Samples were drawn using NUTS(diag_e) at Mon Feb 22 12:12:59 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Bayesian (BRM) outcome

library(brms)
dat<-within(dat, total<-success+failure)
dat.brm <- brm(success|trials(total)~x, data=dat, family='binomial',
   prior=c(set_prior('normal(0,1000)', class='b')),
   n.chains=3, n.iter=1000, warmup=500, n.thin=2
)

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.061524 seconds (Warm-up)
#                0.040658 seconds (Sampling)
#                0.102182 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.073432 seconds (Warm-up)
#                0.039209 seconds (Sampling)
#                0.112641 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.061522 seconds (Warm-up)
#                0.036074 seconds (Sampling)
#                0.097596 seconds (Total)
#

summary(dat.brm)

 Family: binomial (logit) 
Formula: success | trials(total) ~ x 
   Data: dat (Number of observations: 10) 
Samples: 3 chains, each with n.iter = 1000; n.warmup = 500; n.thin = 2; 
         total post-warmup samples = 750
   WAIC: 41.57
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     4.10      1.00     2.24     6.00        200    1
x            -0.26      0.06    -0.38    -0.14        200    1

Samples were drawn using NUTS(diag_e). For each parameter, Eff.Sample is a 
crude measure of effective sample size, and Rhat is the potential scale 
reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
} 
data { 
  int<lower=1> N;  # number of observations 
  int Y[N];  # response variable 
  int<lower=1> K;  # number of fixed effects 
  matrix[N, K] X;  # FE design matrix 
  int trials[N];  # number of trials 
} 
transformed data { 
} 
parameters { 
  real b_Intercept;  # fixed effects Intercept 
  vector[K] b;  # fixed effects 
} 
transformed parameters { 
  vector[N] eta;  # linear predictor 
  # compute linear predictor 
  eta <- X * b + b_Intercept; 
} 
model { 
  # prior specifications 
  b_Intercept ~ normal(0,1000); 
  b ~ normal(0,1000); 
  # likelihood contribution 
  Y ~ binomial_logit(trials, eta); 
} 
generated quantities { 
}

fitted <- subset(dat, select=c(x,success,failure,total))
fitted$success <- NA
newdata <- expand.grid(success=NA, failure=NA,x=seq(min(dat$x), max(dat$x), len=100), total=NA)
dat.pred <- rbind(dat,fitted,newdata)

Now lets fit the model.

#fit the model
dat.inla <- inla(success~x, family='binomial',Ntrials=total,
   data=dat.pred,
   control.family=list(link='logit'),
   control.predictor=list(link=1, compute=TRUE),
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))
#examine the regular summary 
summary(dat.inla)

Call:
c("inla(formula = success ~ x, family = \"binomial\", data = dat.pred, ",  "    Ntrials = total, control.compute = list(dic = TRUE, cpo = TRUE, ",  "        waic = TRUE), control.predictor = list(link = 1, compute = TRUE), ",  "    control.family = list(link = \"logit\"))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0825          0.0438          0.0288          0.1551 

Fixed effects:
               mean    sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept)  4.0391 1.088     1.9497   4.0230     6.2217  3.9907   0
x           -0.2577 0.068    -0.3940  -0.2567    -0.1266 -0.2548   0

The model has no random effects

The model has no hyperparameters

Expected number of effective parameters(std dev): 2.001(0.00)
Number of equivalent replicates : 4.999 

Deviance Information Criterion (DIC) ...: 42.59
Effective number of parameters .........: 1.987

Watanabe-Akaike information criterion (WAIC) ...: 41.93
Effective number of parameters .................: 1.15

Marginal log-Likelihood:  -26.41 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

newdata <- cbind(newdata,
  dat.inla$summary.fitted.values[(nrow(dat)+nrow(fitted)+1):nrow(dat.pred),])
head(newdata)

                     success failure        x total      mean         sd 0.025quant  0.5quant 0.975quant
fitted.predictor.021      NA      NA 12.07823    NA 0.7125421 0.05985063  0.5878929 0.7153579  0.8212576
fitted.predictor.022      NA      NA 12.15141    NA 0.7088373 0.05946393  0.5852983 0.7115181  0.8171994
fitted.predictor.023      NA      NA 12.22459    NA 0.7050996 0.05906512  0.5826879 0.7076480  0.8130769
fitted.predictor.024      NA      NA 12.29776    NA 0.7013292 0.05865439  0.5800613 0.7037477  0.8088906
fitted.predictor.025      NA      NA 12.37094    NA 0.6975263 0.05823199  0.5774180 0.6998179  0.8046408
fitted.predictor.026      NA      NA 12.44412    NA 0.6936912 0.05779817  0.5747575 0.6958587  0.8003281
                          mode
fitted.predictor.021 0.7212332
fitted.predictor.022 0.7171021
fitted.predictor.023 0.7129469
fitted.predictor.024 0.7087681
fitted.predictor.025 0.7045663
fitted.predictor.026 0.7003420

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))

#fitted <- cbind(fitted, 
#  dat.inla$summary.fitted.values[(nrow(dat)+1):(nrow(dat)+nrow(fitted)),])
#fitted <- reshape:::rename(fitted, c("0.025quant"="lower", "0.975quant"="upper"))

#fitted$partial.obs=fitted$mean + (fitted$mean - dat$y)

library(grid)
ggplot(newdata, aes(y=mean, x=x)) +
        geom_blank()+
        geom_point(data=dat, aes(y=I(success/total), x=x)) +
        geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
        geom_line() +
        scale_y_continuous('Probability of success')+
        theme_classic()+
        theme(legend.key.width=unit(2,'lines'), legend.position=c(0,1), legend.justification=c(0,1),
              axis.title.y=element_text(margin(r=2,unit='lines')))

Beta regression (percent cover)

Percent cover (or any percentage for that matter) can logically only range from 0 to 1 (0, 100%). When the observed values are sufficiently away from either 0 and 1, a Gaussian distribution might be effective. However, when observations are close to 0 or 1, Gaussian based models are likely to be poor representations of the true situation at these boundaries. Indeed, predictions close to these boundaries (or the confidence intervals) may actually fall outside the logical range.

The beta distribution is defined for any real number between 0 and 1. The beta distribution is a relatively complex modelling distribution as it can take on a variety of forms. As with models that incorporate a binomial distribution, the beta distribution is typically associated with a logit link function.

Details of data generation

Random data incorporating the following trends (effect parameters)

the number of levels of $x$ = 10
the continuous $x$ variable is a random uniform spread of measurements between 10 and 20
the rate of change in log odds ratio per unit change in x (slope) = -0.8. The magnitude of the slope is an indicator of how abruptly the log odds ratio changes. A very abrupt change would occur if there was very little variability in the trend. That is, a large slope would be indicative of a threshold effect. A lower slope indicates a gradual change in odds ratio and thus a less obvious and precise switch between the likelihood of 1 vs 0.
the inflection point (point where the slope of the line is greatest) = 16. The inflection point indicates the value of the predictor ($x$) where the log Odds are 50% (switching between 1 being more likely and 0 being more likely). This is also known as the LD50.
the intercept (which has no real interpretation) is equal to the negative of the slope multiplied by the inflection point
generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor.
generate percentage cover observations by drawing values from a beta distribution with shape parameters defined according to: $$ \begin{align} \text{shape1} (\alpha) &=\Bigg(\frac{1-\mu}{\sigma^2} - \frac{1}{\mu}\Bigg)\\ \text{shape2} (\beta) &=\alpha\Bigg(\frac{1}{\mu} - 1 \Bigg)\\ \end{align} $$ these cover values are expressed as proportions rather than percentages since values need to be between 0 and 1. The mean ($\mu$) is determined by the linear predictor and the variance ($\sigma^2) we will nominate as 0.01.

set.seed(8)
#The number of levels of x
n.x <- 10
#Create x values that at uniformly distributed throughout the rate of 10 to 20
x <- sort(runif(n = n.x, min = 10, max =20))
slope=-.8
#Inflection point is where the slope of the line is greatest
#this is also the LD50 point
inflect <- 16
#Intercept (no interpretation)
intercept <- -1*(slope*inflect)
#The linear predictor
linpred <- intercept+slope*x
y.pred <- exp(linpred)/(1+exp(linpred))
sigma=0.01
estBetaParams <- function(mu, var) {
  alpha <- ((1 - mu) / var - 1 / mu) * mu ^ 2
  beta <- alpha * (1 / mu - 1)
  return(params = list(alpha = alpha, beta = beta))
}
betaShapes=estBetaParams(y.pred, sigma)
y=rbeta(n=n.x, shape1=betaShapes$alpha, shape2=betaShapes$beta)
dat <- data.frame(y,x)

$$ \begin{align} log\left(\frac{\pi}{1-\pi}\right)&=\beta_0+\beta_1x_1+\epsilon_i, \hspace{1cm}\epsilon\sim Beta(\alpha, \beta)\\ \alpha &=\Bigg(\frac{1-\mu}{\sigma^2} - \frac{1}{\mu}\Bigg)\\ \beta &=\alpha\Bigg(\frac{1}{\mu} - 1 \Bigg)\\ \end{align} $$

Frequentist outcome

library(betareg)
dat.glm <- betareg(y~x, data=dat)
summary(dat.glm)

Call:
betareg(formula = y ~ x, data = dat)

Standardized weighted residuals 2:
    Min      1Q  Median      3Q     Max 
-2.9455 -0.9286 -0.0994  0.9147  1.3906 

Coefficients (mean model with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  11.7680     2.0298   5.798 6.72e-09 ***
x            -0.7634     0.1281  -5.957 2.57e-09 ***

Phi coefficients (precision model with identity link):
      Estimate Std. Error z value Pr(>|z|)  
(phi)    9.427      4.278   2.204   0.0275 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood: 12.96 on 3 Df
Pseudo R-squared: 0.6214
Number of iterations: 32 (BFGS) + 8 (Fisher scoring)

Bayesian (JAGS) outcome

modelString="
model{
  for (i in 1:N) {
    y[i] ~ dbeta(a[i],b[i])
    a[i] <- eta[i] * phi
    b[i] <- (1 - eta[i]) * phi
    logit(eta[i]) <- beta0+beta1*x[i]
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  phi ~ dgamma(0.01, 0.01)
}
"
library(R2jags)
dat.list <- with(dat, list(y=y, x=x, N=nrow(dat)))
dat.jags <- jags(data=dat.list,model.file=textConnection(modelString),
                   param=c('beta0','beta1','phi'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 88

Initializing model

print(dat.jags)

Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 100
 n.sims = 300 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta0     11.783   2.378   7.636  10.183  11.458  13.363  16.476 1.020   120
beta1     -0.764   0.143  -1.052  -0.856  -0.753  -0.668  -0.500 1.026    90
phi        8.832   4.228   2.579   5.747   8.345  11.059  19.175 1.002   300
deviance -22.691   2.689 -25.702 -24.548 -23.459 -21.564 -15.303 1.025   250

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.6 and DIC = -19.1
DIC is an estimate of expected predictive error (lower deviance is better).

Bayesian (STAN) outcome

library(rstan)
modelString="
data { 
  int<lower=1> n;  // number of observations 
  vector[n] y;  // response variable 
  int<lower=1> nX;  // number of fixed effects 
  matrix[n, nX] X;  // FE design matrix 
} 
transformed data { 
} 
parameters { 
  vector[nX] beta;  // fixed effects 
  real<lower=0> phi;  // precision parameter 
} 
transformed parameters { 
  vector[n] eta;  // linear predictor 
  // compute linear predictor 
  eta <- X * beta; 
  for (i in 1:n) { 
    eta[i] <- inv_logit(eta[i]); 
  } 
} 
model { 
  // prior specifications 
  phi ~ gamma(0.01, 0.01); 
  beta ~ normal(0, 1000);
  // likelihood contribution 
  y ~ beta(eta * phi, (1 - eta) * phi); 
} 
generated quantities { 
}
"
Xmat <- model.matrix(~x,data=dat)
dat.list <- with(dat, list(y = y, X = Xmat,nX=ncol(Xmat),n = nrow(dat)))


library(rstan)
dat.rstan <- stan(data=dat.list,
           model_code=modelString,
           chains=3,
           iter=1000,
           warmup=500,
                   thin=2,
           save_dso=TRUE
           )

SAMPLING FOR MODEL 'd261ea666f0b7a9209e5feedf9d26f12' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.250474 seconds (Warm-up)
#                0.133006 seconds (Sampling)
#                0.38348 seconds (Total)
# 

SAMPLING FOR MODEL 'd261ea666f0b7a9209e5feedf9d26f12' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.255301 seconds (Warm-up)
#                0.200749 seconds (Sampling)
#                0.45605 seconds (Total)
# 

SAMPLING FOR MODEL 'd261ea666f0b7a9209e5feedf9d26f12' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.292455 seconds (Warm-up)
#                0.172565 seconds (Sampling)
#                0.46502 seconds (Total)
#

print(dat.rstan, pars=c('beta','phi'))

Inference for Stan model: d261ea666f0b7a9209e5feedf9d26f12.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1] 11.72    0.15 2.47  6.40 10.14 11.69 13.35 16.37   289 1.01
beta[2] -0.76    0.01 0.15 -1.05 -0.86 -0.76 -0.67 -0.43   283 1.01
phi      7.86    0.17 3.02  2.55  5.72  7.62  9.81 14.15   321 1.00

Samples were drawn using NUTS(diag_e) at Mon Feb 22 21:30:57 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Bayesian (BRM) outcome

library(brms)
dat.brm <- brm(y~x, data=dat, family='beta',
   prior=c(set_prior('normal(0,1000)', class='b')),
   n.chains=3, n.iter=1000, warmup=500, n.thin=2
)

SAMPLING FOR MODEL 'beta(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 7.54483 seconds (Warm-up)
#                17.3168 seconds (Sampling)
#                24.8616 seconds (Total)
# 

SAMPLING FOR MODEL 'beta(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.112244 seconds (Warm-up)
#                0.056446 seconds (Sampling)
#                0.16869 seconds (Total)
# 

SAMPLING FOR MODEL 'beta(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.085228 seconds (Warm-up)
#                0.061105 seconds (Sampling)
#                0.146333 seconds (Total)
#

summary(dat.brm)

 Family: beta (logit) 
Formula: y ~ x 
   Data: dat (Number of observations: 10) 
Samples: 3 chains, each with iter = 1000; warmup = 500; thin = 2; 
         total post-warmup samples = 750
   WAIC: Not computed
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    13.43      3.26     7.56    18.08          2 1.77
x            -0.86      0.19    -1.13    -0.50          2 1.70

Family Specific Parameters: 
    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
phi    19.82     17.26     3.08    49.04          2 5.94

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
} 
data { 
  int<lower=1> N;  // number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
} 
transformed data { 
} 
parameters { 
  real temp_Intercept;  // temporary Intercept 
  vector[K] b;  // fixed effects 
  real<lower=0> phi;  // precision parameter 
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  // compute linear predictor 
  eta <- X * b + temp_Intercept; 
  for (n in 1:N) { 
    eta[n] <- inv_logit(eta[n]); 
  } 
} 
model { 
  // prior specifications 
  b ~ normal(0,1000); 
  phi ~ gamma(0.01, 0.01); 
  // likelihood contribution 
  Y ~ beta(eta * phi, (1 - eta) * phi); 
} 
generated quantities { 
  real b_Intercept;  // fixed effects intercept 
  b_Intercept <- temp_Intercept - dot_product(X_means, b); 
}

fitted <- subset(dat, select=c(x,y))
fitted$y <- NA
newdata <- expand.grid(x=seq(min(dat$x), max(dat$x), len=100), y=NA)
dat.pred <- rbind(dat,fitted,newdata)

Now lets fit the model.

#fit the model
library(INLA)
dat.inla <- inla(y~x, family='beta',
   data=dat.pred,
   control.family=list(link='logit'),
   control.predictor=list(link=1, compute=TRUE),
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))
#examine the regular summary 
summary(dat.inla)

Call:
c("inla(formula = y ~ x, family = \"beta\", data = dat.pred, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE), control.predictor = list(link = 1, ",  "    compute = TRUE), control.family = list(link = \"logit\"))" )

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0695          0.0710          0.0248          0.1653 

Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) 11.6845 2.0768     7.7662  11.6131    15.9882 11.4692   0
x           -0.7599 0.1252    -1.0177  -0.7562    -0.5220 -0.7488   0

The model has no random effects

Model hyperparameters:
                                               mean    sd 0.025quant 0.5quant 0.975quant  mode
precision parameter for the beta observations 8.711 3.662      3.357    8.142      17.46 6.989

Expected number of effective parameters(std dev): 2.00(0.00)
Number of equivalent replicates : 5.00 

Deviance Information Criterion (DIC) ...: -21.16
Effective number of parameters .........: 2.456

Watanabe-Akaike information criterion (WAIC) ...: -20.19
Effective number of parameters .................: 2.797

Marginal log-Likelihood:  5.913 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

newdata <- cbind(newdata,
  dat.inla$summary.fitted.values[(nrow(dat)+nrow(fitted)+1):nrow(dat.pred),])
head(newdata)

                            x  y      mean         sd 0.025quant  0.5quant 0.975quant      mode
fitted.predictor.021 12.07823 NA 0.9140861 0.04629234  0.7996400 0.9228020  0.9775402 0.9387765
fitted.predictor.022 12.15141 NA 0.9100518 0.04750029  0.7931865 0.9187774  0.9758870 0.9348454
fitted.predictor.023 12.22459 NA 0.9058414 0.04871891  0.7865733 0.9145631  0.9741167 0.9307015
fitted.predictor.024 12.29776 NA 0.9014486 0.04994645  0.7797996 0.9101522  0.9722216 0.9263219
fitted.predictor.025 12.37094 NA 0.8968674 0.05118101  0.7728648 0.9055379  0.9701937 0.9217106
fitted.predictor.026 12.44412 NA 0.8920916 0.05242054  0.7657683 0.9007135  0.9680244 0.9168501

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))

library(grid)
ggplot(newdata, aes(y=mean*100, x=x)) +
        geom_blank()+
        geom_point(data=dat, aes(y=y*100, x=x)) +
        geom_ribbon(aes(ymin=lower*100, ymax=upper*100), fill='blue', alpha=0.2) +
        geom_line() +
        scale_y_continuous(expression(Cover~abundance~('%')))+
        theme_classic()+
        theme(legend.key.width=unit(2,'lines'), legend.position=c(0,1), legend.justification=c(0,1))

Poisson regression (count data)

Lets say we wanted to model the abundance of an item ($y$) against a continuous predictor ($x$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size = 20
the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
the value of $x$ when log$y$ equals 0 (when $y$=1)
to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.

set.seed(8)
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
y <- rpois(n=n.x, lambda=lambda)
dat <- data.frame(y,x)
data.pois <- dat

$$ \begin{align} y_i &\sim{} Pois(\lambda_i)\\ log(\lambda_i)&=\beta_0+\beta_1x_1\\ \end{align} $$

Frequentist outcome

dat.glm <- glm(y~x, data=data.pois, family="poisson")
summary(dat.glm)

Call:
glm(formula = y ~ x, family = "poisson", data = data.pois)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6353  -0.7049   0.0440   0.5624   2.0457  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.56002    0.25395   2.205   0.0274 *  
x            0.11151    0.01858   6.000 1.97e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 55.614  on 19  degrees of freedom
Residual deviance: 17.226  on 18  degrees of freedom
AIC: 90.319

Number of Fisher Scoring iterations: 4

Bayesian (JAGS) outcome

modelString="
model{
  for (i in 1:N) {
    y[i] ~ dpois(lambda[i])
    log(lambda[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
}
"
dat.list <- with(data.pois, list(y=y, x=x, N=nrow(data.pois)))
dat.jags <- jags(data=dat.list,model.file=textConnection(modelString),
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 147

Initializing model

print(dat.jags)

Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 100
 n.sims = 300 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      0.556   0.246  0.077  0.407  0.533  0.718  1.088 1.009   300
beta1      0.111   0.018  0.073  0.099  0.111  0.124  0.143 1.007   300
deviance  88.241   1.734 86.376 86.873 87.781 89.023 92.674 1.000   300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 1.5 and DIC = 89.8
DIC is an estimate of expected predictive error (lower deviance is better).

Bayesian (STAN) outcome

library(rstan)
modelString="
data {
   int<lower=1> n;
   int<lower=1> nX;
   int y[n];
   matrix [n,nX] X;
}
parameters {
   vector [nX] beta;
}
transformed parameters {
   vector [n] mu;
   mu<-X*beta;
}
model {
   beta ~ normal(0,1000);
   y ~ poisson_log(mu);
}
"
Xmat <- model.matrix(~x,data=data.pois)
dat.list <- with(data.pois, list(y = y, X = Xmat, nX=ncol(Xmat),n = nrow(data.pois)))

library(rstan)
dat.rstan <- stan(data=dat.list,
           model_code=modelString,
           chains=3,
           iter=1000,
           warmup=500,
                   thin=2,
           save_dso=TRUE
           )

SAMPLING FOR MODEL '95c0e3e05517b98739c4f0c17ec17bdd' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.086074 seconds (Warm-up)
#                0.040789 seconds (Sampling)
#                0.126863 seconds (Total)
# 

SAMPLING FOR MODEL '95c0e3e05517b98739c4f0c17ec17bdd' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.051455 seconds (Warm-up)
#                0.030687 seconds (Sampling)
#                0.082142 seconds (Total)
# 

SAMPLING FOR MODEL '95c0e3e05517b98739c4f0c17ec17bdd' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.068899 seconds (Warm-up)
#                0.036493 seconds (Sampling)
#                0.105392 seconds (Total)
#

print(dat.rstan, pars=c('beta'))

Inference for Stan model: 95c0e3e05517b98739c4f0c17ec17bdd.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
beta[1] 0.54    0.01 0.24 0.05 0.39 0.55 0.70  1.01   255 1.01
beta[2] 0.11    0.00 0.02 0.08 0.10 0.11 0.12  0.15   239 1.01

Samples were drawn using NUTS(diag_e) at Tue Feb 23 11:04:58 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Bayesian (BRM) outcome

library(brms)
dat.brm <- brm(y~x, data=data.pois, family='poisson',
   prior=c(set_prior('normal(0,1000)', class='b')),
   n.chains=3, n.iter=1000, warmup=500, n.thin=2
)

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.021825 seconds (Warm-up)
#                0.01908 seconds (Sampling)
#                0.040905 seconds (Total)
# 

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.03022 seconds (Warm-up)
#                0.028595 seconds (Sampling)
#                0.058815 seconds (Total)
# 

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.021021 seconds (Warm-up)
#                0.018216 seconds (Sampling)
#                0.039237 seconds (Total)
#

summary(dat.brm)

 Family: poisson (log) 
Formula: y ~ x 
   Data: data.pois (Number of observations: 20) 
Samples: 3 chains, each with iter = 1000; warmup = 500; thin = 2; 
         total post-warmup samples = 750
   WAIC: Not computed
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     0.54      0.26    -0.02     1.00        305 1.02
x             0.11      0.02     0.08     0.15        362 1.02

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
} 
data { 
  int<lower=1> N;  // number of observations 
  int Y[N];  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
} 
transformed data { 
} 
parameters { 
  real temp_Intercept;  // temporary Intercept 
  vector[K] b;  // fixed effects 
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  // compute linear predictor 
  eta <- X * b + temp_Intercept; 
} 
model { 
  // prior specifications 
  b ~ normal(0,1000); 
  // likelihood contribution 
  Y ~ poisson_log(eta); 
} 
generated quantities { 
  real b_Intercept;  // fixed effects intercept 
  b_Intercept <- temp_Intercept - dot_product(X_means, b); 
}

fitted <- subset(dat, select=c(x,y))
fitted$y <- NA
newdata <- expand.grid(x=seq(min(data.pois$x), max(data.pois$x), len=100), y=NA)
dat.pred <- rbind(dat,fitted,newdata)

Now lets fit the model.

#fit the model
dat.inla <- inla(y~x, family='poisson',
   data=dat.pred,
   control.family=list(link='log'),
   control.predictor=list(link=1, compute=TRUE),
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))
#examine the regular summary 
summary(dat.inla)

Call:
c("inla(formula = y ~ x, family = \"poisson\", data = dat.pred, control.compute = list(dic = TRUE, ",  "    cpo = TRUE, waic = TRUE), control.predictor = list(link = 1, ",  "    compute = TRUE), control.family = list(link = \"log\"))" )

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0898          0.0448          0.0357          0.1704 

Fixed effects:
              mean     sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 0.5624 0.2540     0.0469   0.5683     1.0450 0.5803   0
x           0.1115 0.0186     0.0755   0.1114     0.1484 0.1111   0

The model has no random effects

The model has no hyperparameters

Expected number of effective parameters(std dev): 2.002(0.00)
Number of equivalent replicates : 9.991 

Deviance Information Criterion (DIC) ...: 90.31
Effective number of parameters .........: 2.00

Watanabe-Akaike information criterion (WAIC) ...: 90.32
Effective number of parameters .................: 1.809

Marginal log-Likelihood:  -52.09 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

newdata <- cbind(newdata,
  dat.inla$summary.fitted.values[(nrow(dat)+nrow(fitted)+1):nrow(dat.pred),])
head(newdata)

                            x  y     mean        sd 0.025quant 0.5quant 0.975quant     mode
fitted.predictor.041 1.024733 NA 2.021898 0.4765762   1.217135 1.978066   3.077859 1.893767
fitted.predictor.042 1.203403 NA 2.061043 0.4793782   1.249221 2.017707   3.121132 1.934330
fitted.predictor.043 1.382074 NA 2.100965 0.4821355   1.282138 2.058139   3.165053 1.975705
fitted.predictor.044 1.560745 NA 2.141682 0.4848464   1.315907 2.099378   3.209754 2.017948
fitted.predictor.045 1.739415 NA 2.183208 0.4875087   1.350549 2.141438   3.255151 2.061031
fitted.predictor.046 1.918086 NA 2.225560 0.4901204   1.386083 2.184338   3.301238 2.104974

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))

library(grid)
ggplot(newdata, aes(y=mean, x=x)) +
        geom_blank()+
        geom_point(data=data.pois, aes(y=y, x=x)) +
        geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
        geom_line() +
        theme_classic()+
        theme(legend.key.width=unit(2,'lines'), legend.position=c(0,1), legend.justification=c(0,1))

Poisson regression (count data) with overdispersion

There are numerous ways to deal with overdispersion. The traditional way was to use a 'quasi' family. Such an approach fits a regular distribution and then alters the standard errors according to the degree of overdispersion and is considered by many to be a bit of a hack.

If overdispersion is the consequence of latent (un-measured or un-modelled) influences inflating the variances, then one alternative is to incorporate a latent variable that will 'soak' up this additional variance. This is the theory behind including a unit-level random effect.

Details of data generation

Random data incorporating the following trends (effect parameters)

define a function that will generate random samples from a quasi-poisson distribution $$QuasiPoisson\left(\frac{\mu}{\phi-1}, \frac{1}{\phi}\right)$$ where $\mu$ is the expected value (mean) and $\phi$ is the dispersion factor such that $var(y)=\phi\mu$.
the sample size = 20
the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
the value of $x$ when log$y$ equals 0 (when $y$=1)
to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a quasi-poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.

#Quasi-poisson distrition
rqpois <- function(n, lambda, phi) {
  mu = lambda
  r = rnbinom(n, size=mu/phi/(1-1/phi),prob=1/phi)
  return(r)
}

set.seed(8)
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
y <- rqpois(n=n.x, lambda=lambda, phi=4)
dat.qp <- data.frame(y,x)
data.qp <- dat.qp

$$ \begin{align} y_{ij} &\sim{} Pois(\lambda_{ij})\\ log(\lambda_{ij})&=\beta_0+\beta_1x_1 + \sum^j_{J=1} \gamma U_j\\ \end{align} $$

Frequentist outcome

library(lme4)
data.qp$unit=1:nrow(data.qp)
dat.glm <- glmer(y~x+(1|unit), data=data.qp, family="poisson")
summary(dat.glm)

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: poisson  ( log )
Formula: y ~ x + (1 | unit)
   Data: data.qp

     AIC      BIC   logLik deviance df.resid 
   114.6    117.6    -54.3    108.6       17 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.37873 -0.65032  0.04504  0.41908  0.68899 

Random effects:
 Groups Name        Variance Std.Dev.
 unit   (Intercept) 0.4251   0.652   
Number of obs: 20, groups:  unit, 20

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.19971    0.47627   0.419 0.674981    
x            0.12684    0.03833   3.309 0.000936 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
  (Intr)
x -0.915

Bayesian (JAGS) outcome

data.qp$unit=1:nrow(data.qp)
modelString="
model{
  for (i in 1:N) {
    y[i] ~ dpois(lambda[i])
    log(lambda[i]) <- max(-20,min(20,beta0+beta1*x[i] + gamma[U[i]]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  for (j in 1:nU) {
    gamma[j] ~ dnorm(0, tau)
  }
  ## Half-cauchy (scale=25) prior
  tau <- pow(sigma,-2)
  sigma <-z/sqrt(chSq)
  z ~ dnorm(0, .0016)I(0,)
  chSq ~ dgamma(0.5, 0.5)
}
"
dat.list <- with(data.qp, list(y=y, x=x, U=as.numeric(unit), nU=nrow(data.qp), N=nrow(data.qp)))
dat.jags <- jags(data=dat.list,model.file=textConnection(modelString),
                   param=c('beta0','beta1','sigma'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 197

Initializing model

print(dat.jags)

Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 100
 n.sims = 300 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      0.001   0.651 -1.431 -0.362  0.075  0.421  1.201 1.047    60
beta1      0.141   0.053  0.046  0.110  0.137  0.172  0.250 1.049    67
sigma      0.852   0.284  0.456  0.666  0.802  0.965  1.576 1.006   210
deviance  84.483   6.824 74.009 79.661 84.080 88.029 98.306 0.997   300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 23.4 and DIC = 107.9
DIC is an estimate of expected predictive error (lower deviance is better).

Bayesian (STAN) outcome

data.qp$unit=1:nrow(data.qp)
library(rstan)
modelString="
data {
   int<lower=1> n;
   int<lower=1> nX;
   int<lower=1> nU;
   int y[n];
   matrix [n,nX] X;
   int U[n];
}
parameters {
   vector [nX] beta;
   vector [nU] gamma;
   real<lower=0> sigma;
}
transformed parameters {
   vector [n] mu;
   mu<-X*beta;
   for (i in 1:n) {
     mu[i] <- mu[i] + gamma[U[i]];
   }
}
model {
   beta ~ normal(0,1000);
   sigma ~ cauchy(0,25);
   gamma ~ normal(0,sigma);

   y ~ poisson_log(mu);
}
"
Xmat <- model.matrix(~x,data=data.qp)
dat.list <- with(data.qp, list(y = y, X = Xmat, nX=ncol(Xmat),
                 U=as.numeric(unit), nU=nrow(data.qp),
                 n = nrow(data.qp)))

library(rstan)
dat.rstan <- stan(data=dat.list,
           model_code=modelString,
           chains=3,
           iter=1000,
           warmup=500,
                   thin=2,
           save_dso=TRUE
           )

SAMPLING FOR MODEL '6b01137f7762e83c8564fa8b7bbd08b8' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.291649 seconds (Warm-up)
#                0.151107 seconds (Sampling)
#                0.442756 seconds (Total)
# 

SAMPLING FOR MODEL '6b01137f7762e83c8564fa8b7bbd08b8' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.279321 seconds (Warm-up)
#                0.102494 seconds (Sampling)
#                0.381815 seconds (Total)
# 

SAMPLING FOR MODEL '6b01137f7762e83c8564fa8b7bbd08b8' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.306079 seconds (Warm-up)
#                0.110866 seconds (Sampling)
#                0.416945 seconds (Total)
#

print(dat.rstan, pars=c('beta','sigma'))

Inference for Stan model: 6b01137f7762e83c8564fa8b7bbd08b8.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

        mean se_mean   sd  2.5%   25%  50%  75% 97.5% n_eff Rhat
beta[1] 0.01    0.03 0.59 -1.25 -0.36 0.06 0.42  1.01   314 1.00
beta[2] 0.14    0.00 0.05  0.05  0.11 0.13 0.17  0.23   287 1.00
sigma   0.85    0.02 0.26  0.47  0.67 0.81 0.98  1.44   248 1.01

Samples were drawn using NUTS(diag_e) at Tue Feb 23 12:30:49 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Bayesian (BRM) outcome

data.qp$unit=1:nrow(data.qp)
library(brms)
dat.brm <- brm(y~x+(1|unit), data=data.qp, family='poisson',
   prior=c(set_prior('normal(0,1000)', class='b')),
   chains=3, iter=1000, warmup=500, thin=2
)

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.139169 seconds (Warm-up)
#                0.059387 seconds (Sampling)
#                0.198556 seconds (Total)
# 

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.165589 seconds (Warm-up)
#                0.062659 seconds (Sampling)
#                0.228248 seconds (Total)
# 

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.140647 seconds (Warm-up)
#                0.056132 seconds (Sampling)
#                0.196779 seconds (Total)
#

summary(dat.brm)

 Family: poisson (log) 
Formula: y ~ x + (1 | unit) 
   Data: data.qp (Number of observations: 20) 
Samples: 3 chains, each with iter = 1000; warmup = 500; thin = 2; 
         total post-warmup samples = 750
   WAIC: Not computed
 
Random Effects: 
~unit (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.83      0.26     0.46     1.43        171 1.01

Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     0.07      0.55    -1.04     1.14        480    1
x             0.13      0.05     0.05     0.23        348    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
} 
data { 
  int<lower=1> N;  // number of observations 
  int Y[N];  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
  // data for random effects of unit 
  int<lower=1> J_1[N];  // RE levels 
  int<lower=1> N_1;  // number of levels 
  int<lower=1> K_1;  // number of REs 
  real Z_1[N];  // RE design matrix 
} 
transformed data { 
} 
parameters { 
  real temp_Intercept;  // temporary Intercept 
  vector[K] b;  // fixed effects 
  vector[N_1] pre_1;  // unscaled REs 
  real<lower=0> sd_1;  // RE standard deviation 
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  vector[N_1] r_1;  // REs 
  // compute linear predictor 
  eta <- X * b + temp_Intercept; 
  r_1 <- sd_1 * (pre_1);  // scale REs 
  for (n in 1:N) { 
    eta[n] <- eta[n] + Z_1[n] * r_1[J_1[n]]; 
  } 
} 
model { 
  // prior specifications 
  b ~ normal(0,1000); 
  sd_1 ~ student_t(3, 0, 5); 
  pre_1 ~ normal(0, 1); 
  // likelihood contribution 
  Y ~ poisson_log(eta); 
} 
generated quantities { 
  real b_Intercept;  // fixed effects intercept 
  b_Intercept <- temp_Intercept - dot_product(X_means, b); 
}

fitted <- subset(data.qp, select=c(x,y, unit))
fitted$y <- NA
newdata <- expand.grid(x=seq(min(data.qp$x), max(data.qp$x), len=100), y=NA, unit=NA)
dat.pred <- rbind(data.qp,fitted,newdata)

Now lets fit the model.

#fit the model
dat.inla <- inla(y~x+f(unit, model='iid'), family='poisson',
   data=dat.pred,
   control.family=list(link='log'),
   control.predictor=list(link=1, compute=TRUE),
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))
#examine the regular summary 
summary(dat.inla)

Call:
c("inla(formula = y ~ x + f(unit, model = \"iid\"), family = \"poisson\", ",  "    data = dat.pred, control.compute = list(dic = TRUE, cpo = TRUE, ",  "        waic = TRUE), control.predictor = list(link = 1, compute = TRUE), ",  "    control.family = list(link = \"log\"))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0950          0.0728          0.0331          0.2008 

Fixed effects:
              mean     sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 0.2176 0.4647    -0.7614   0.2394     1.0738 0.2822   0
x           0.1263 0.0378     0.0549   0.1251     0.2046 0.1229   0

Random effects:
Name	  Model
 unit   IID model 

Model hyperparameters:
                    mean    sd 0.025quant 0.5quant 0.975quant  mode
Precision for unit 3.154 2.022     0.8659    2.647      8.429 1.889

Expected number of effective parameters(std dev): 13.15(1.516)
Number of equivalent replicates : 1.521 

Deviance Information Criterion (DIC) ...: 100.91
Effective number of parameters .........: 13.48

Watanabe-Akaike information criterion (WAIC) ...: 100.97
Effective number of parameters .................: 10.12

Marginal log-Likelihood:  -70.59 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

newdata <- cbind(newdata,
  dat.inla$summary.fitted.values[(nrow(data.qp)+nrow(fitted)+1):nrow(dat.pred),])
head(newdata)

                            x  y unit     mean        sd 0.025quant 0.5quant 0.975quant     mode
fitted.predictor.041 1.024733 NA   NA 1.546157 0.6600472  0.5723928 1.443850   3.115993 1.263051
fitted.predictor.042 1.203403 NA   NA 1.577515 0.6633672  0.5928836 1.476424   3.151031 1.297411
fitted.predictor.043 1.382074 NA   NA 1.609575 0.6666484  0.6140817 1.509731   3.186584 1.332625
fitted.predictor.044 1.560745 NA   NA 1.642354 0.6698890  0.6360093 1.543786   3.222667 1.368681
fitted.predictor.045 1.739415 NA   NA 1.675869 0.6730872  0.6586892 1.578607   3.259294 1.405544
fitted.predictor.046 1.918086 NA   NA 1.710138 0.6762414  0.6821445 1.614211   3.296481 1.443231

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))

library(grid)
ggplot(newdata, aes(y=mean, x=x)) +
        geom_blank()+
        geom_point(data=data.qp, aes(y=y, x=x)) +
        geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
        geom_line() +
        theme_classic()+
        theme(legend.key.width=unit(2,'lines'), legend.position=c(0,1), legend.justification=c(0,1))

Negative Binomial regression (overdispersed count data)

The Negative Binomial distribution strictly represents the number of successes before a failure. In practice it is a beta-poisson distribution. As such, it is a distribution that is defined by two parameters (which can be defined in terms of location and scale). When the model has a dispersion parameter of 1 (when $\lambda = \mu = \sigma^2$), the Negative Binomial distribution is equivalent to a Poisson distribution.

Hence rather than assume that the dispersion parameter is 1 ($\lambda = \mu = \sigma^2$), we can use a distribution that estimates the scale as well as the location parameter.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size = 20
the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
the value of $x$ when log$y$ equals 0 (when $y$=1)
to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.

set.seed(37) #16 #35
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
y <- rnbinom(n=n.x, mu=lambda, size=1)
dat.nb <- data.frame(y,x)
data.nb <- dat.nb

$$ \begin{align} y_{ij} &\sim{} NB(p_{ij}, size)\\ p_{ij} &= size/(size+\lambda_{ij})\\ log(\lambda_{ij})&=\beta_0+\beta_1x_1 + \sum^j_{J=1} \gamma U_j\\ size &\sim{} U(0.001, 10000)\\ \end{align} $$

Frequentist outcome

library(MASS)
dat.glm <- glm.nb(y~x, data=data.nb)
summary(dat.glm)

Call:
glm.nb(formula = y ~ x, data = data.nb, init.theta = 2.359878187, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7874  -0.8940  -0.3172   0.4420   1.3660  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.74543    0.42534   1.753  0.07968 . 
x            0.09494    0.03441   2.759  0.00579 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(2.3599) family taken to be 1)

    Null deviance: 30.443  on 19  degrees of freedom
Residual deviance: 21.806  on 18  degrees of freedom
AIC: 115.85

Number of Fisher Scoring iterations: 1


              Theta:  2.36 
          Std. Err.:  1.10 

 2 x log-likelihood:  -109.849

Bayesian (JAGS) outcome

data.qp$unit=1:nrow(data.qp)
modelString="
model{
  for (i in 1:N) {
    y[i] ~ dnegbin(p[i],size)
    p[i] <- size/(size+lambda[i])
    log(lambda[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  size ~ dunif(0.001, 1000)
}
"
dat.list <- with(data.nb, list(y=y, x=x, N=nrow(data.nb)))
dat.jags <- jags(data=dat.list,model.file=textConnection(modelString),
                   param=c('beta0','beta1','size'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 190

Initializing model

print(dat.jags)

Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 100
 n.sims = 300 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta0      0.761   0.425   0.014   0.471   0.732   1.014   1.616 0.997   300
beta1      0.094   0.033   0.022   0.073   0.095   0.117   0.150 0.999   300
size       3.916   7.867   1.038   2.034   2.742   3.766   8.598 1.060   160
deviance 113.293   3.196 110.049 111.142 112.441 114.103 121.912 1.043   160

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 5.1 and DIC = 118.4
DIC is an estimate of expected predictive error (lower deviance is better).

Bayesian (STAN) outcome

library(rstan)
modelString="
data {
   int<lower=1> n;
   int<lower=1> nX;
   int y[n];
   matrix [n,nX] X;
}
parameters {
   vector [nX] beta;
   real<lower=0> shape;
}
transformed parameters {
   vector [n] eta;
   eta<-X*beta;
}
model {
   beta ~ normal(0,1000);
   shape ~ student_t(3,0,5);
   y ~ neg_binomial_2_log(eta, shape);
}
"
Xmat <- model.matrix(~x,data=data.nb)
dat.list <- with(data.nb, list(y = y, X = Xmat, nX=ncol(Xmat),
                 n = nrow(data.nb)))

library(rstan)
dat.rstan <- stan(data=dat.list,
           model_code=modelString,
           chains=3,
           iter=1000,
           warmup=500,
                   thin=2,
           save_dso=TRUE
           )

SAMPLING FOR MODEL '7ed6b7a1636f285e8c38d5fd272fc919' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 3.8897 seconds (Warm-up)
#                0.22708 seconds (Sampling)
#                4.11678 seconds (Total)
# 

SAMPLING FOR MODEL '7ed6b7a1636f285e8c38d5fd272fc919' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 6.20803 seconds (Warm-up)
#                12.3776 seconds (Sampling)
#                18.5856 seconds (Total)
# 

SAMPLING FOR MODEL '7ed6b7a1636f285e8c38d5fd272fc919' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 3.37625 seconds (Warm-up)
#                0.126583 seconds (Sampling)
#                3.50283 seconds (Total)
#

print(dat.rstan, pars=c('beta','shape'))

Inference for Stan model: 7ed6b7a1636f285e8c38d5fd272fc919.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

        mean se_mean   sd  2.5%  25%  50%  75% 97.5% n_eff Rhat
beta[1] 0.57    0.21 0.56 -0.40 0.15 0.60 0.97  1.74     7 1.31
beta[2] 0.11    0.01 0.04  0.03 0.08 0.10 0.13  0.18    11 1.25
shape   1.57    0.05 0.31  0.87 1.37 1.59 1.84  1.98    43 1.08

Samples were drawn using NUTS(diag_e) at Tue Feb 23 16:08:26 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Bayesian (BRM) outcome

library(brms)
dat.brm <- brm(y~x, data=data.nb, family='negbinomial',
   prior=c(set_prior('normal(0,1000)', class='b')),
   chains=3, iter=1000, warmup=500, thin=2
)

SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 3.63316 seconds (Warm-up)
#                12.482 seconds (Sampling)
#                16.1151 seconds (Total)
# 

SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 7.01036 seconds (Warm-up)
#                12.4126 seconds (Sampling)
#                19.423 seconds (Total)
# 

SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 5.72697 seconds (Warm-up)
#                4.75621 seconds (Sampling)
#                10.4832 seconds (Total)
#

summary(dat.brm)

 Family: negbinomial (log) 
Formula: y ~ x 
   Data: data.nb (Number of observations: 20) 
Samples: 3 chains, each with iter = 1000; warmup = 500; thin = 2; 
         total post-warmup samples = 750
   WAIC: Not computed
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     0.79      0.48    -0.04     1.91         80 1.02
x             0.09      0.04     0.01     0.17        104 1.01

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
shape     1.57      0.29     0.91     1.98         89 1.01

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
} 
data { 
  int<lower=1> N;  // number of observations 
  int Y[N];  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
} 
transformed data { 
} 
parameters { 
  real temp_Intercept;  // temporary Intercept 
  vector[K] b;  // fixed effects 
  real<lower=0> shape;  // shape parameter 
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  // compute linear predictor 
  eta <- X * b + temp_Intercept; 
} 
model { 
  // prior specifications 
  b ~ normal(0,1000); 
  shape ~ student_t(3, 0, 5); 
  // likelihood contribution 
  Y ~ neg_binomial_2_log(eta, shape); 
} 
generated quantities { 
  real b_Intercept;  // fixed effects intercept 
  b_Intercept <- temp_Intercept - dot_product(X_means, b); 
}

fitted <- subset(data.nb, select=c(x,y))
fitted$y <- NA
newdata <- expand.grid(x=seq(min(data.nb$x), max(data.nb$x), len=100), y=NA)
dat.pred <- rbind(data.nb,fitted,newdata)

Now lets fit the model.

#fit the model
dat.inla <- inla(y~x, family='nbinomial',
   data=dat.pred,
   control.family=list(link='log'),
   control.predictor=list(link=1, compute=TRUE),
   control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE))
#examine the regular summary 
summary(dat.inla)

Call:
c("inla(formula = y ~ x, family = \"nbinomial\", data = dat.pred, ",  "    control.compute = list(dic = TRUE, cpo = TRUE, waic = TRUE), ",  "    control.predictor = list(link = 1, compute = TRUE), control.family = list(link = \"log\"))" )

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0866          0.0483          0.0302          0.1650 

Fixed effects:
              mean     sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 0.7439 0.4569    -0.1380   0.7379     1.6633 0.7283   0
x           0.0949 0.0368     0.0226   0.0948     0.1680 0.0944   0

The model has no random effects

Model hyperparameters:
                                                      mean     sd 0.025quant 0.5quant 0.975quant  mode
size for the nbinomial observations (overdispersion) 1.859 0.7529      0.794    1.725      3.702 1.485

Expected number of effective parameters(std dev): 2.00(1e-04)
Number of equivalent replicates : 9.998 

Deviance Information Criterion (DIC) ...: 115.27
Effective number of parameters .........: 2.474

Watanabe-Akaike information criterion (WAIC) ...: 115.09
Effective number of parameters .................: 2.055

Marginal log-Likelihood:  -63.84 
CPO and PIT are computed

Posterior marginals for linear predictor and fitted values computed

newdata <- cbind(newdata,
  dat.inla$summary.fitted.values[(nrow(data.nb)+nrow(fitted)+1):nrow(dat.pred),])
head(newdata)

                            x  y     mean       sd 0.025quant 0.5quant 0.975quant     mode
fitted.predictor.041 1.042191 NA 2.546251 1.166255   1.029584 2.311028   5.447691 1.940691
fitted.predictor.042 1.217094 NA 2.582946 1.165233   1.058728 2.350080   5.477622 1.982258
fitted.predictor.043 1.391996 NA 2.620294 1.164145   1.088655 2.389805   5.507934 2.024570
fitted.predictor.044 1.566899 NA 2.658306 1.162991   1.119382 2.430214   5.538646 2.067639
fitted.predictor.045 1.741802 NA 2.696996 1.161771   1.150928 2.471320   5.569776 2.111475
fitted.predictor.046 1.916705 NA 2.736379 1.160484   1.183221 2.513135   5.601344 2.156089

newdata <- reshape:::rename(newdata, c("0.025quant"="lower", "0.975quant"="upper"))

library(grid)
ggplot(newdata, aes(y=mean, x=x)) +
        geom_blank()+
        geom_point(data=data.nb, aes(y=y, x=x)) +
        geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
        geom_line() +
        theme_classic()+
        theme(legend.key.width=unit(2,'lines'), legend.position=c(0,1), legend.justification=c(0,1))

Zero-inflation Poisson (ZIP) regression (count data)

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size = 20
the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
the value of $x$ when log$y$ equals 0 (when $y$=1)
to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.

set.seed(37) #34.5  #4 #10 #16 #17 #26
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
library(gamlss.dist)
#fixed latent binomial
y<- rZIP(n.x,lambda, 0.4)
#latent binomial influenced by the linear predictor 
#y<- rZIP(n.x,lambda, 1-exp(linpred)/(1+exp(linpred)))
dat.zip <- data.frame(y,x)

data.zip <- dat.zip

summary(glm(y~x, dat.zip, family="poisson"))

Call:
glm(formula = y ~ x, family = "poisson", data = dat.zip)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.5415  -2.3769  -0.9753   1.1736   3.6380  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.67048    0.33018   2.031   0.0423 *
x            0.02961    0.02663   1.112   0.2662  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 85.469  on 19  degrees of freedom
Residual deviance: 84.209  on 18  degrees of freedom
AIC: 124.01

Number of Fisher Scoring iterations: 6

plot(glm(y~x, dat.zip, family="poisson"))

library(pscl)
summary(zeroinfl(y ~ x | 1, dist = "poisson", data = dat.zip))

Call:
zeroinfl(formula = y ~ x | 1, data = dat.zip, dist = "poisson")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.0015 -0.9556 -0.3932  0.9663  1.6195 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.70474    0.31960   2.205 0.027449 *  
x            0.08734    0.02532   3.449 0.000563 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.2292     0.4563  -0.502    0.615
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 13 
Log-likelihood: -36.17 on 3 Df

plot(resid(zeroinfl(y ~ x | 1, dist = "poisson", data = dat.zip))~fitted(zeroinfl(y ~ x | 1, dist = "poisson", data = dat.zip)))

library(gamlss)
summary(gamlss(y~x,data=dat.zip, family=ZIP))

GAMLSS-RS iteration 1: Global Deviance = 72.4363 
GAMLSS-RS iteration 2: Global Deviance = 72.3428 
GAMLSS-RS iteration 3: Global Deviance = 72.3428 
*******************************************************************
Family:  c("ZIP", "Poisson Zero Inflated") 

Call:  gamlss(formula = y ~ x, family = ZIP, data = dat.zip) 

Fitting method: RS() 

-------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.70582    0.31958   2.209  0.04123 * 
x            0.08719    0.02533   3.442  0.00311 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-------------------------------------------------------------------
Sigma link function:  logit
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.2292     0.4563  -0.502    0.622

-------------------------------------------------------------------
No. of observations in the fit:  20 
Degrees of Freedom for the fit:  3
      Residual Deg. of Freedom:  17 
                      at cycle:  3 
 
Global Deviance:     72.34282 
            AIC:     78.34282 
            SBC:     81.33002 
*******************************************************************

predict(gamlss(y~x,data=dat.zip, family=ZIP), se.fit=TRUE, what="mu")

GAMLSS-RS iteration 1: Global Deviance = 72.4363 
GAMLSS-RS iteration 2: Global Deviance = 72.3428 
GAMLSS-RS iteration 3: Global Deviance = 72.3428

$fit
        1         2         3         4         5         6         7         8         9        10        11 
0.7966905 0.9236189 1.0263933 1.0369823 1.1305358 1.3763304 1.4054417 1.4299603 1.4951229 1.6161339 1.7035853 
       12        13        14        15        16        17        18        19        20 
1.7629994 1.8678543 1.9838052 1.9951833 2.1187071 2.1249555 2.1431616 2.1837285 2.3064727 

$se.fit
        1         2         3         4         5         6         7         8         9        10        11 
0.3517894 0.3089432 0.2758980 0.2726011 0.2445792 0.1856062 0.1807322 0.1770893 0.1696661 0.1656458 0.1710016 
       12        13        14        15        16        17        18        19        20 
0.1783138 0.1973009 0.2251987 0.2282363 0.2637860 0.2656903 0.2712879 0.2840032 0.3241532

$$p(y \mid \theta,\lambda) = \left\{ \begin{array}{l l} \theta + (1-\theta)\times \text{Pois}(0|\lambda) & \quad \text{if $y_i=0$ and}\\ (1-\theta)\times \text{Pois}(y_i|\lambda) & \quad \text{if $y_i>0$} \end{array} \right.$ $$ $$ \begin{align} log(\lambda_{ij})&=\beta_0+\beta_1x_1 + \sum^j_{J=1} \gamma U_j\\ \end{align} $$

Frequentist outcome

library(pscl)
summary(zeroinfl(y ~ x | 1, dist = "poisson", data = dat.zip))

Call:
zeroinfl(formula = y ~ x | 1, data = dat.zip, dist = "poisson")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.0015 -0.9556 -0.3932  0.9663  1.6195 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.70474    0.31960   2.205 0.027449 *  
x            0.08734    0.02532   3.449 0.000563 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.2292     0.4563  -0.502    0.615
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 13 
Log-likelihood: -36.17 on 3 Df

summary(zeroinfl(y ~ x | x, dist = "poisson", data = dat.zip))

Call:
zeroinfl(formula = y ~ x | x, data = dat.zip, dist = "poisson")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.2761 -0.7814 -0.5933  0.9078  2.1317 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.61113    0.33001   1.852 0.064044 .  
x            0.09389    0.02574   3.647 0.000265 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.7305     1.8897  -1.445    0.148
x             0.2170     0.1461   1.485    0.137
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 17 
Log-likelihood: -34.41 on 4 Df

library(gamlss)
summary(gamlss(y~x,data=dat.zip, family=ZIP))

GAMLSS-RS iteration 1: Global Deviance = 72.4363 
GAMLSS-RS iteration 2: Global Deviance = 72.3428 
GAMLSS-RS iteration 3: Global Deviance = 72.3428 
*******************************************************************
Family:  c("ZIP", "Poisson Zero Inflated") 

Call:  gamlss(formula = y ~ x, family = ZIP, data = dat.zip) 

Fitting method: RS() 

-------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.70582    0.31958   2.209  0.04123 * 
x            0.08719    0.02533   3.442  0.00311 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-------------------------------------------------------------------
Sigma link function:  logit
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.2292     0.4563  -0.502    0.622

-------------------------------------------------------------------
No. of observations in the fit:  20 
Degrees of Freedom for the fit:  3
      Residual Deg. of Freedom:  17 
                      at cycle:  3 
 
Global Deviance:     72.34282 
            AIC:     78.34282 
            SBC:     81.33002 
*******************************************************************

Bayesian (JAGS) outcome

dat.zip.list <- with(dat.zip,list(Y=y, X=x,N=nrow(data.zip), z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(one.minus.theta)
     Y[i] ~ dpois(lambda[i])
     lambda[i] <- z[i]*eta[i]
     log(eta[i]) <- beta0 + beta1*X[i]
  }
  one.minus.theta <- 1-theta
  logit(theta) <- gamma0
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  gamma0 ~ dnorm(0,1.0E-06)
} 
"
library(R2jags)
dat.zip.jags <- jags(model=textConnection(modelString), data=dat.zip.list, inits=NULL,
          param=c('beta0','beta1', 'gamma0','theta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 149

Initializing model

print(dat.zip.jags)

Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      0.711   0.308  0.101  0.504  0.719  0.915  1.318 1.001  3000
beta1      0.086   0.025  0.037  0.070  0.085  0.102  0.133 1.001  3000
gamma0    -0.206   0.461 -1.129 -0.502 -0.194  0.113  0.683 1.001  3000
theta      0.451   0.109  0.244  0.377  0.452  0.528  0.664 1.001  3000
deviance  75.580   2.443 72.827 73.793 74.952 76.659 81.836 1.003  1300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.0 and DIC = 78.6
DIC is an estimate of expected predictive error (lower deviance is better).

Bayesian (STAN) outcome

library(rstan)
modelString="
functions { 
   real zero_inflated_poisson_log(int y, real eta, real eta_zi) { 
     if (y == 0) { 
       return log_sum_exp(bernoulli_logit_log(1, eta_zi), 
                          bernoulli_logit_log(0, eta_zi) + 
                          poisson_log_log(0, eta)); 
     } else { 
       return bernoulli_logit_log(0, eta_zi) +  
              poisson_log_log(y, eta); 
     } 
   }
} 
data {
   int<lower=1> n;
   int<lower=1> nX;
   int y[n];
   matrix [n*2,nX] X;
}
parameters {
   vector [nX] beta;
   real<lower=0> shape;
}
transformed parameters {
   vector [n*2] eta;
   eta<-X*beta;
}
model {
   beta ~ normal(0,1000);
   for (i in 1:n) {
     y[i] ~ zero_inflated_poisson(eta[i], eta[i+n]);
   }
}
"
Xmat <- model.matrix(~x,data=data.zip)
dat.list <- with(data.zip, list(y = y, X = rbind(Xmat,Xmat), nX=ncol(Xmat),
                 n = nrow(data.zip)))

library(rstan)
dat.rstan <- stan(data=dat.list,
           model_code=modelString,
           chains=3,
           iter=1000,
           warmup=500,
                   thin=2,
           save_dso=TRUE
           )

SAMPLING FOR MODEL '983801421387f281f0182fb32509705d' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.454001 seconds (Warm-up)
#                0.115789 seconds (Sampling)
#                0.56979 seconds (Total)
# 

SAMPLING FOR MODEL '983801421387f281f0182fb32509705d' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.155689 seconds (Warm-up)
#                0.035454 seconds (Sampling)
#                0.191143 seconds (Total)
# 

SAMPLING FOR MODEL '983801421387f281f0182fb32509705d' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.200173 seconds (Warm-up)
#                0.322706 seconds (Sampling)
#                0.522879 seconds (Total)
#

print(dat.rstan, pars=c('beta'))

Inference for Stan model: 983801421387f281f0182fb32509705d.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

        mean se_mean   sd 2.5%  25% 50%  75% 97.5% n_eff Rhat
beta[1] 0.41    0.14 0.24 0.03 0.22 0.4 0.55  0.86     3 1.50
beta[2] 0.10    0.01 0.02 0.06 0.09 0.1 0.11  0.13     4 1.27

Samples were drawn using NUTS(diag_e) at Thu Feb 25 13:24:10 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

## dat.rstan.model <- stan_model(model_code=modelString)
## dat.rstan <- sampling(dat.rstan.model,data=dat.list,chains=3,iter=1000,warmup=500,thin=2)

Bayesian (BRM) outcome

library(brms)
dat.brm <- brm(y~x, data=data.zip, family='zero_inflated_poisson',
   prior=c(set_prior('normal(0,1000)', class='b')),
   chains=3, iter=1000, warmup=500, thin=2
)

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.048984 seconds (Warm-up)
#                0.03777 seconds (Sampling)
#                0.086754 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.04523 seconds (Warm-up)
#                0.041655 seconds (Sampling)
#                0.086885 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.042881 seconds (Warm-up)
#                0.042045 seconds (Sampling)
#                0.084926 seconds (Total)
#

summary(dat.brm)

 Family: zero_inflated_poisson (log) 
Formula: y ~ x 
   Data: data.zip (Number of observations: 20) 
Samples: 3 chains, each with iter = 1000; warmup = 500; thin = 2; 
         total post-warmup samples = 750
   WAIC: Not computed
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept      0.4      0.36    -0.38     1.08        495    1
x              0.1      0.03     0.05     0.16        460    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
  /* zero-inflated poisson log-PDF of a single response 
   * Args: 
   *   y: the response value 
   *   eta: linear predictor for poisson part 
   *   eta_zi: linear predictor for zero-inflation part 
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real zero_inflated_poisson_log(int y, real eta, real eta_zi) { 
     if (y == 0) { 
       return log_sum_exp(bernoulli_logit_log(1, eta_zi), 
                          bernoulli_logit_log(0, eta_zi) + 
                          poisson_log_log(0, eta)); 
     } else { 
       return bernoulli_logit_log(0, eta_zi) +  
              poisson_log_log(y, eta); 
     } 
   }
} 
data { 
  int<lower=1> N;  // number of observations 
  int<lower=1> N_trait;  // number of obs / 2 
  int Y[N_trait];  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
} 
transformed data { 
} 
parameters { 
  real temp_Intercept;  // temporary Intercept 
  vector[K] b;  // fixed effects 
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  // compute linear predictor 
  eta <- X * b + temp_Intercept; 
} 
model { 
  // prior specifications 
  b ~ normal(0,1000); 
  // likelihood contribution 
  for (n in 1:N_trait) { 
    Y[n] ~ zero_inflated_poisson(eta[n], eta[n + N_trait]); 
  } 
} 
generated quantities { 
  real b_Intercept;  // fixed effects intercept 
  b_Intercept <- temp_Intercept - dot_product(X_means, b); 
}

standata(dat.brm)

$N
[1] 40

$Y
 [1]  0  3  1  4  5  1  5  6  4  0  0  0  0  4  0  0  9  0  0 12

$N_trait
[1] 20

$K
[1] 1

$X
             x
1  -9.45822567
2  -8.00251059
3  -6.82381320
4  -6.70236995
5  -5.62942498
6  -2.81045852
7  -2.47658683
8  -2.19538785
9  -1.44805234
10 -0.06020261
11  0.94275959
12  1.62416640
13  2.82672610
14  4.15654241
15  4.28703487
16  5.70370304
17  5.77536504
18  5.98416729
19  6.44941992
20  7.85714787
21 -9.45822567
22 -8.00251059
23 -6.82381320
24 -6.70236995
25 -5.62942498
26 -2.81045852
27 -2.47658683
28 -2.19538785
29 -1.44805234
30 -0.06020261
31  0.94275959
32  1.62416640
33  2.82672610
34  4.15654241
35  4.28703487
36  5.70370304
37  5.77536504
38  5.98416729
39  6.44941992
40  7.85714787

$X_means
       x 
10.50042

# include zero-inflated intercept
dat.brm <- brm(y~0+main + spec + main:x, data=data.zip, family='zero_inflated_poisson',
   prior=c(set_prior('normal(0,1000)', class='b')),
   chains=3, iter=1000, warmup=500, thin=2
)

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.163529 seconds (Warm-up)
#                0.065766 seconds (Sampling)
#                0.229295 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.117409 seconds (Warm-up)
#                0.059019 seconds (Sampling)
#                0.176428 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 0.132163 seconds (Warm-up)
#                0.064132 seconds (Sampling)
#                0.196295 seconds (Total)
#

summary(dat.brm)

 Family: zero_inflated_poisson (log) 
Formula: y ~ 0 + main + spec + main:x 
   Data: data.zip (Number of observations: 20) 
Samples: 3 chains, each with iter = 1000; warmup = 500; thin = 2; 
         total post-warmup samples = 750
   WAIC: Not computed
 
Fixed Effects: 
       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
main       0.66      0.32     0.00     1.24        384 1.01
spec      -0.26      0.47    -1.18     0.68        601 1.00
main:x     0.09      0.03     0.04     0.14        413 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

stancode(dat.brm)

functions { 
  /* zero-inflated poisson log-PDF of a single response 
   * Args: 
   *   y: the response value 
   *   eta: linear predictor for poisson part 
   *   eta_zi: linear predictor for zero-inflation part 
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real zero_inflated_poisson_log(int y, real eta, real eta_zi) { 
     if (y == 0) { 
       return log_sum_exp(bernoulli_logit_log(1, eta_zi), 
                          bernoulli_logit_log(0, eta_zi) + 
                          poisson_log_log(0, eta)); 
     } else { 
       return bernoulli_logit_log(0, eta_zi) +  
              poisson_log_log(y, eta); 
     } 
   }
} 
data { 
  int<lower=1> N;  // number of observations 
  int<lower=1> N_trait;  // number of obs / 2 
  int Y[N_trait];  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
} 
transformed data { 
} 
parameters { 
  vector[K] b;  // fixed effects 
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  // compute linear predictor 
  eta <- X * b; 
} 
model { 
  // prior specifications 
  b ~ normal(0,1000); 
  // likelihood contribution 
  for (n in 1:N_trait) { 
    Y[n] ~ zero_inflated_poisson(eta[n], eta[n + N_trait]); 
  } 
} 
generated quantities { 
}

standata(dat.brm)

$N
[1] 40

$Y
 [1]  0  3  1  4  5  1  5  6  4  0  0  0  0  4  0  0  9  0  0 12

$N_trait
[1] 20

$K
[1] 3

$X
   main spec    main:x
1     1    0  1.042191
2     1    0  2.497906
3     1    0  3.676603
4     1    0  3.798047
5     1    0  4.870992
6     1    0  7.689958
7     1    0  8.023830
8     1    0  8.305029
9     1    0  9.052364
10    1    0 10.440214
11    1    0 11.443176
12    1    0 12.124583
13    1    0 13.327143
14    1    0 14.656959
15    1    0 14.787451
16    1    0 16.204120
17    1    0 16.275782
18    1    0 16.484584
19    1    0 16.949836
20    1    0 18.357564
21    0    1  0.000000
22    0    1  0.000000
23    0    1  0.000000
24    0    1  0.000000
25    0    1  0.000000
26    0    1  0.000000
27    0    1  0.000000
28    0    1  0.000000
29    0    1  0.000000
30    0    1  0.000000
31    0    1  0.000000
32    0    1  0.000000
33    0    1  0.000000
34    0    1  0.000000
35    0    1  0.000000
36    0    1  0.000000
37    0    1  0.000000
38    0    1  0.000000
39    0    1  0.000000
40    0    1  0.000000

$X_means
    main     spec   main:x 
0.500000 0.500000 5.250208

Tutorial 12.10 - Integrated Nested Laplace Approximation (INLA) for (generalized) linear models

Overview

Simple regression

Out-of-sample or leave-one-out predictive performance

The influence of priors

More detailed information

Fixed effects

Predictor

Hyper-parameters

Predictions

Multiple linear regression

Predictions

Single factor ANOVA

Factorial ANOVA

Logistic regression (binary response)

Grouped binomial regression (proportional counts)

Beta regression (percent cover)

Poisson regression (count data)

Poisson regression (count data) with overdispersion

Negative Binomial regression (overdispersed count data)

Zero-inflation Poisson (ZIP) regression (count data)

References