Tutorial 7.3b - Multiple linear regression (Bayesian)
12 Jan 2018
Multiple and complex regression analyses can be useful for situations in which patterns in a response variable can not be adequately described by a single straight line resulting from a single predictor and/or a simple linear equation.
As the multiple linear regression design is very much consistent between frequentist and Bayesian approaches, you are advised to review the tutorial on frequentist multiple linear regression. Much of the important assumptions and exploratory data analysis issued discussed in that tutorial are also relevant in a Bayesian framework, yet for brevity reasons will not be repeated here.
General form of linear models
Additive model
$$y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+...+\beta_jx_{ij}+\epsilon_i$$ where $\beta_0$ is the population y-intercept (value of $y$ when all partial slopes equal zero), $\beta_1$, $\beta_2$, etc are the partial population slopes of $Y$ on $X_1$, $X_2$, etc respectively holding the other $X$ constant. $\epsilon_i$ is the random unexplained error or residual component.
The additive model assumes that the effect of one predictor variable (partial slope) is independent of the levels of the other predictor variables.
Multiplicative model
$$y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i1}x_{i2}+...+\epsilon_i$$ where $\beta_3x_{i1}x_{i2}$ is the interactive effect of $X_1$ and $X_2$ on $Y$ and it examines the degree to which the effect of one of the predictor variables depends on the levels of the other predictor variable(s).
Scenario and Data
Lets say we had set up a natural experiment in which we measured a response ($y$) from each of 20 sampling units ($n=20$) across a landscape. At the same time, we also measured two other continuous covariates ($x1$ and $x2$) from each of the sampling units. As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.
set.seed(3) n = 100 intercept = 5 temp = runif(n) nitro = runif(n) + 0.8 * temp int.eff = 2 temp.eff <- 0.85 nitro.eff <- 0.5 res = rnorm(n, 0, 1) coef <- c(int.eff, temp.eff, nitro.eff, int.eff) mm <- model.matrix(~temp * nitro) y <- t(coef %*% t(mm)) + res data <- data.frame(y, x1 = temp, x2 = nitro, cx1 = scale(temp, scale = F), cx2 = scale(nitro, scale = F)) head(data)
y x1 x2 cx1 1 3.513305 0.1680415 0.9007709 -0.31604197 2 5.090382 0.8075164 1.3281453 0.32343291 3 4.036943 0.3849424 0.5170847 -0.09914114 4 4.006436 0.3277343 0.9741312 -0.15634918 5 5.381677 0.6021007 1.0869787 0.11801718 6 4.530071 0.6043941 0.8240744 0.12031056 cx2 1 0.02986272 2 0.45723717 3 -0.35382350 4 0.10322304 5 0.21607055 6 -0.04683372
With these sort of data, we are primarily interested in investigating whether there is a relationship between the continuous response variable and the components linear predictor (continuous predictors). We could model the relationship via either:
- an additive model in which the effects of each predictor contribute in an additive way to the response - we do not allow for an interaction as we consider an interaction either not of great importance or likely to be absent.
- and multiplicative model in which the effects of each predictor and their interaction contribute to the response - we allow for the impact of one predictor to vary across the range of the other predictor.
Centering data
When a linear model contains a covariate (continuous predictor variable) in addition to another predictor (continuous or categorical), it is nearly always advisable that the continuous predictor variables be centered prior to the analysis. Centering is a process by which the mean of a variable is subtracted from each of the values such that the scale of the variable is shifted so as to be centered around 0. Hence the mean of the new centered variable will be 0, yet it will retain the same variance.
Raw dataCentred data
There are multiple reasons for this:
- Firstly, it provides some biological meaning to the y-intercept. Recall that the y-intercept is the value of Y when X is equal to zero. If X is centered, then the y-intercept represents the value of Y at the mid-point of the X range. The y-intercept of an un-centered X typically represents a un-real value of Y (as an X of 0 is often beyond the reasonable range of values).
- Secondly, in multiplicative models (in which predictors and their interactions are included), main effects and interaction terms built from centered predictors will not be correlated to one another (see below)
- Thirdly, for more complex models, centering the covariates can increase the likelihood that the modelling engine converges (arrives at a numerically stable and reliable outcome).
In R, centering is easily achieved with the scale function. Note, the scale() function centers and scales (divides by standard deviation) data. We only really need to center the data, so we provide the argument scale=FALSE. Also note that the scale() function attaches the pre-centered mean (and standard deviation if scaling performed) as attributes to the scaled data in order to facilitate back-scaling to the original scale. While these attributes are often convenient, they do cause issues for some of the Bayesian routines and so we will strip these attributes using the as.numeric() function. Instead, we will create separate scalar variables to store the pre-scaled means.
data <- within(data, { cx1 <- as.numeric(scale(x1, scale = FALSE)) cx2 <- as.numeric(scale(x2, scale = FALSE)) }) head(data)
y x1 x2 cx1 1 3.513305 0.1680415 0.9007709 -0.31604197 2 5.090382 0.8075164 1.3281453 0.32343291 3 4.036943 0.3849424 0.5170847 -0.09914114 4 4.006436 0.3277343 0.9741312 -0.15634918 5 5.381677 0.6021007 1.0869787 0.11801718 6 4.530071 0.6043941 0.8240744 0.12031056 cx2 1 0.02986272 2 0.45723717 3 -0.35382350 4 0.10322304 5 0.21607055 6 -0.04683372
mean.x1 = mean(data$x1) mean.x2 = mean(data$x2)
Assumptions
- All of the observations are independent - this must be addressed at the design and collection stages
- The response variable (and thus the residuals) should be normally distributed. A boxplot of the entire variable is usually useful for diagnosing major issues with normality.
- The response variable should be equally varied (variance should not be related to mean as these are supposed to be estimated separately). Scatterplots with linear smoothers can be useful for exploring the spread of observations around the trendline. The spread of observations around the trendline should not increase (or decrease) along its length.
- The predictor variables should be uniformly or normally distributed. Again, boxplots can be useful.
- The relationships between the linear predictors (right hand side of the regression formula) and the response variable should be linear. Scatterplots with smoothers can be useful for identifying possible non-linearity.
- (Multi)collinearity - see below
- The number of predictor variables must be less than the number of observations otherwise the linear model will be over-parameterized (more parameters to estimate than there are independent data from which estimations are calculated).
(Multi)collinearity - a predictor variable must not be correlated to the combination of other predictor variables (known collectively as the linear predictor). Multicollinearity has major detrimental effects on model fitting:
- instability of the estimated partial regression slopes (small changes in the data or variable inclusion can cause dramatic changes in parameter estimates)
- inflated standard errors and confidence intervals of model parameters, thereby increasing the type II error rate (reducing power) of parameter hypothesis tests
- investigate pairwise correlations between all the predictor variables either by a correlation matrix or a scatterplot matrix
- calculate tolerance ($1-r^2$ of the relationship between a predictor variable and all the other predictor variables) for each of the predictor variables. Tolerance is a measure of the degree of collinearity and values less $<0.2$ should be considered and values $<0.1$ given series attention. Variance inflation factor (VIF) are the inverse of tolerance and thus values greater than 5, or worse, 10 indicate collinearity.
- PCA (principle components analysis) eigenvalues (from a correlation matrix for all the predictor variables) close to zero indicate collinearity and component loadings may be useful in determining which predictor variables cause collinearity.
- remove the highly correlated predictor variable(s), starting with the least most biologically interesting variable(s)
- PCA (principle components analysis) regression - regress the response variable against the principal components resulting from a correlation matrix for all the predictor variables. Each of these principal components by definition are completely independent, but the resulting parameter estimates must be back-calculated in order to have any biological meaning.
- apply a regression tree - regression trees recursively partitioning (subsetting) the data in accordance to individual variables that explain the greatest remaining variance. Since at each iteration, each predictor variable is effectively evaluated in isololation, (multi)collinearity is not an issue.
These assumptions should be explored using the techniques highlighted here
Model fitting or statistical analysis
Consistent with Tutorial 7.2b we will explore Bayesian modelling of multiple linear regression using a variety of tools (such as MCMCpack, JAGS, RSTAN, RSTANARM and BRMS). Whilst JAGS and RSTAN are extremely flexible and thus allow models to be formulated that contain not only the simple model, but also additional derivatives, the other approaches are more restrictive. Consequently, I will mostly restrict models to just the minimum necessary and all derivatives will instead be calculated in R itself from the returned posteriors. Hence for each model, I will generate a mcmc list (data.mcmc.list) containing the mcmc sample matrix for each chain. This mcmc list will be considered a standard starting point for all other manipulations.
Multiple linear regression models can include predictors (terms) that are incorporated additively (no interactions) or multiplicatively (with interactions). As such we will explore these separately for each modelling tool.
The observed response ($y_i$) are assumed to be drawn from a normal distribution with a given mean ($\mu$) and standard deviation ($\sigma$). The expected values ($\mu$) are themselves determined by the linear predictor ($\beta_0 + \beta X_i$). In this case, $\beta_0$ represents the y-intercept (value of $y$ when all of the $x$'s are equal to zero) and the set of $\beta$'s represent the rates of change in $y$ for every unit change in each $x$ (the effect) holding each other $x$ constant.
Note that since we should always center all predictors (by subtracting the mean of each $x$ from the repective values of each $x$), the y-intercept represents the value of $y$ at the average value of each $x$.
MCMC sampling requires priors on all parameters. We will employ weakly informative priors. Specifying 'uninformative' priors is always a bit of a balancing act. If the priors are too vague (wide) the MCMC sampler can wander off into nonscence areas of likelihood rather than concentrate around areas of highest likelihood (desired when wanting the outcomes to be largely driven by the data). On the other hand, if the priors are too strong, they may have an influence on the parameters. In such a simple model, this balance is very forgiving - it is for more complex models that prior choice becomes more important.
For this simple model, we will go with zero-centered Gaussian (normal) priors with relatively large standard deviations (100) for both the intercept and the treatment effect and a wide half-cauchy (scale=5) for the standard deviation. $$ \begin{align} y_i &\sim{} N(\mu_i, \sigma)\\ \mu_i &= \beta_0 + \beta X_i\\[1em] \beta_0 &\sim{} N(0,100)\\ \beta &\sim{} N(0,100)\\ \sigma &\sim{} cauchy(0,5)\\ \end{align} $$
Additive model
library(MCMCpack) data.mcmcpack.add <- MCMCregress(y ~ cx1 + cx2, data = data)
Multiplicative model
library(MCMCpack) data.mcmcpack.mult <- MCMCregress(y ~ cx1 * cx2, data = data)
If we define the model in terms of matrices, the JAGS model definition is identical for both additive and multiplicative models. $$\begin{align} y_i&\sim{}N(\mu_i, \tau)\\ \mu_i &= \beta_0 + \beta X_i\\ \beta_0&\sim{}N(0,1.0{E-6}) \hspace{1cm}\mathsf{non-informative~prior~for~interept}\\ \beta_j&\sim{}N(0,1.0{E-6}) \hspace{1cm}\mathsf{non-informative~prior~for~partial~slopes}\\ \tau &= 1/\sigma^2\\ \sigma&\sim{}U(0,100)\\ \end{align} $$
Define the model
modelString = " model { #Likelihood for (i in 1:n) { y[i]~dnorm(mu[i],tau) mu[i] <- beta0 + inprod(beta[],X[i,]) } #Priors beta0 ~ dnorm(0.01,1.0E-6) for (j in 1:nX) { beta[j] ~ dnorm(0.01,1.0E-6) } tau <- 1 / (sigma * sigma) sigma~dunif(0,100) } "
Additive model
Define the data list
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor matrix (X)
- the number of predictor variables (nX)
- the total number of observed items (n)
X = model.matrix(~cx1 + cx2, data = data) data.list <- with(data, list(y = y, X = X[, -1], nX = ncol(X) - 1, n = nrow(data)))
Define the MCMC chain parameters
Next we should define the behavioural parameters of the MCMC sampling chains. Include the following:
- the nodes (estimated parameters) to monitor (return samples for)
- the number of MCMC chains (3)
- the number of burnin steps (1000)
- the thinning factor (10)
- the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains
params <- c("beta0", "beta", "sigma") nChains = 3 burnInSteps = 3000 thinSteps = 10 numSavedSteps = 15000 #across all chains nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains) nIter
[1] 53000
Fit the model
Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
## load the R2jags package library(R2jags)
data.r2jags.add <- jags(data = data.list, inits = NULL, parameters.to.save = params, model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter, n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 100 Unobserved stochastic nodes: 4 Total graph size: 618 Initializing model
print(data.r2jags.add)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 3.028 0.501 2.037 2.692 3.032 3.365 4.009 1.001 15000 beta[2] 1.389 0.432 0.545 1.101 1.394 1.675 2.235 1.001 15000 beta0 3.823 0.115 3.597 3.745 3.822 3.901 4.048 1.001 8000 sigma 1.146 0.083 1.001 1.088 1.140 1.198 1.322 1.001 15000 deviance 309.526 2.883 305.925 307.427 308.888 310.918 316.903 1.002 3100 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 4.2 and DIC = 313.7 DIC is an estimate of expected predictive error (lower deviance is better).
data.mcmc.list.add <- as.mcmc(data.r2jags.add)
Multiplicative model
Define the data list
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor matrix (X)
- the number of predictor terms (nX)
- the total number of observed items (n)
X = model.matrix(~cx1 * cx2, data = data) data.list <- with(data, list(y = y, X = X[, -1], nX = ncol(X) - 1, n = nrow(data)))
Define the MCMC chain parameters
Next we should define the behavioural parameters of the MCMC sampling chains. Include the following:
- the nodes (estimated parameters) to monitor (return samples for)
- the number of MCMC chains (3)
- the number of burnin steps (1000)
- the thinning factor (10)
- the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains
params <- c("beta0", "beta", "sigma") nChains = 3 burnInSteps = 3000 thinSteps = 10 numSavedSteps = 15000 #across all chains nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains) nIter
[1] 53000
Fit the model
Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
## load the R2jags package library(R2jags)
data.r2jags.mult <- jags(data = data.list, inits = NULL, parameters.to.save = params, model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter, n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 100 Unobserved stochastic nodes: 5 Total graph size: 721 Initializing model
print(data.r2jags.mult)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 2.931 0.499 1.969 2.593 2.929 3.265 3.902 1.001 15000 beta[2] 1.344 0.426 0.508 1.053 1.343 1.631 2.181 1.001 15000 beta[3] 2.675 1.256 0.182 1.845 2.669 3.509 5.158 1.001 15000 beta0 3.671 0.134 3.413 3.580 3.671 3.760 3.932 1.001 12000 sigma 1.126 0.082 0.981 1.070 1.120 1.176 1.307 1.001 8000 deviance 305.828 3.280 301.532 303.441 305.137 307.457 314.045 1.001 15000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 5.4 and DIC = 311.2 DIC is an estimate of expected predictive error (lower deviance is better).
data.mcmc.list.mult <- as.mcmc(data.r2jags.mult)
Whilst Gibbs sampling provides an elegantly simple MCMC sampling routine, very complex hierarchical models can take enormous numbers of iterations (often prohibitory large) to converge on a stable posterior distribution.
To address this, Andrew Gelman (and other collaborators) have implemented a variation on Hamiltonian Monte Carlo (HMC: a sampler that selects subsequent samples in a way that reduces the correlation between samples, thereby speeding up convergence) called the No-U-Turn (NUTS) sampler. All of these developments are brought together into a tool called Stan ("Sampling Through Adaptive Neighborhoods").
By design (to appeal to the vast BUGS users), Stan models are defined in a manner reminiscent of BUGS. Stan first converts these models into C++ code which is then compiled to allow very rapid computation.
Consistent with the use of C++, the model must be accompanied by variable declarations for all inputs and parameters.
One important difference between Stan and JAGS is that whereas BUGS (and thus JAGS) use precision rather than variance, Stan uses variance.
Stan itself is a stand-alone command line application. However, conveniently, the authors of Stan have also developed an R interface to Stan called Rstan which can be used much like R2jags.
Define the model
The minimum model in Stan required to fit the above simple regression follows. Note the following modifications from the model defined in JAGS:- the normal distribution is defined by variance rather than precision
- rather than using a uniform prior for sigma, I am using a half-Cauchy
We now translate the likelihood model into STAN code.
$$\begin{align}
y_i&\sim{}N(\mu_i, \sigma)\\
\mu_i &= \beta_0+\beta X_i\\
\beta_0&\sim{}N(0,100)\\
\beta&\sim{}N(0,100)\\
\sigma&\sim{}Cauchy(0,5)\\
\end{align}
$$
data { int<lower=1> n; // total number of observations vector[n] Y; // response variable int<lower=1> nX; // number of effects matrix[n, nX] X; // model matrix } transformed data { matrix[n, nX - 1] Xc; // centered version of X vector[nX - 1] means_X; // column means of X before centering for (i in 2:nX) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { vector[nX-1] beta; // population-level effects real cbeta0; // center-scale intercept real<lower=0> sigma; // residual SD } transformed parameters { } model { vector[n] mu; mu = Xc * beta + cbeta0; // prior specifications beta ~ normal(0, 100); cbeta0 ~ normal(0, 100); sigma ~ cauchy(0, 5); // likelihood contribution Y ~ normal(mu, sigma); } generated quantities { real beta0; // population-level intercept vector[n] log_lik; beta0 = cbeta0 - dot_product(means_X, beta); for (i in 1:n) { log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma); } }
Additive model
Define the data list
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor matrix (X)
- the number of predictor variables (nX)
- the total number of observed items (n)
X = model.matrix(~cx1 + cx2, data = data) data.list <- with(data, list(Y = y, X = X, nX = ncol(X), n = nrow(data)))
Define the MCMC chain parameters
Next we should define the behavioural parameters of the No-U-Turn sampling chains. Include the following:
- the nodes (estimated parameters) to monitor (return samples for)
- the number of MCMC chains (3)
- the number of burnin steps (1000)
- the thinning factor (10)
- the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains
nChains = 3 burnInSteps = 1000 thinSteps = 3 numSavedSteps = 3000 #across all chains nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains) nIter
[1] 4000
Fit the model
Now compile and run the Stan code via the rstan interface. Note that the first time jags is run after the rstan package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
During the warmup stage, the No-U-Turn sampler (NUTS) attempts to determine the optimum stepsize - the stepsize that achieves the target acceptance rate (0.8 or 80% by default) without divergence (occurs when the stepsize is too large relative to the curvature of the log posterior - and results in approximations that are likely to diverge and be biased) and without hitting the maximum treedepth (10). At each iteration of the NUTS algorithm, the number of leapfrog steps doubles (as it increases the treedepth) and only terminates when either the NUTS criterion are satisfied or the tree depth reaches the maximum (10 by default).
## load the rstan package library(rstan)
data.rstan.add <- stan(data = data.list, model_code = modelString, chains = nChains, iter = nIter, warmup = burnInSteps, thin = thinSteps, save_dso = TRUE)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0, from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file2b18778a1a1f.cpp:8: /usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined # define BOOST_NO_CXX11_RVALUE_REFERENCES ^ <command-line>:0:0: note: this is the location of the previous definition SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 1). Gradient evaluation took 2.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.073416 seconds (Warm-up) 0.143677 seconds (Sampling) 0.217093 seconds (Total) SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 2). Gradient evaluation took 2.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.060339 seconds (Warm-up) 0.177215 seconds (Sampling) 0.237554 seconds (Total) SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 3). Gradient evaluation took 1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.0688 seconds (Warm-up) 0.149029 seconds (Sampling) 0.217829 seconds (Total)
data.rstan.add
Inference for Stan model: 4e0a54cd22440d9847d7705bdb1ff803. 3 chains, each with iter=4000; warmup=1000; thin=3; post-warmup draws per chain=1000, total post-warmup draws=3000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 3.02 0.01 0.51 2.03 2.69 3.02 3.34 4.05 2747 1 beta[2] 1.38 0.01 0.43 0.52 1.10 1.39 1.68 2.22 2854 1 cbeta0 3.83 0.00 0.12 3.59 3.75 3.83 3.91 4.06 2317 1 sigma 1.15 0.00 0.08 0.99 1.09 1.14 1.20 1.32 2806 1 beta0 3.83 0.00 0.12 3.59 3.75 3.83 3.91 4.06 2317 1 log_lik[1] -1.21 0.00 0.11 -1.46 -1.28 -1.20 -1.13 -1.02 2200 1 log_lik[2] -1.11 0.00 0.09 -1.31 -1.17 -1.11 -1.05 -0.96 2916 1 log_lik[3] -1.45 0.00 0.14 -1.74 -1.54 -1.44 -1.35 -1.21 2829 1 log_lik[4] -1.16 0.00 0.09 -1.35 -1.22 -1.16 -1.10 -1.00 2219 1 log_lik[5] -1.37 0.00 0.10 -1.59 -1.44 -1.37 -1.30 -1.19 2327 1 log_lik[6] -1.12 0.00 0.08 -1.28 -1.17 -1.12 -1.07 -0.98 2313 1 log_lik[7] -1.27 0.00 0.16 -1.63 -1.36 -1.24 -1.15 -1.02 2806 1 log_lik[8] -2.43 0.01 0.25 -2.95 -2.59 -2.42 -2.26 -1.99 2248 1 log_lik[9] -1.42 0.00 0.16 -1.77 -1.51 -1.40 -1.31 -1.15 2796 1 log_lik[10] -1.09 0.00 0.08 -1.24 -1.14 -1.09 -1.04 -0.95 2402 1 log_lik[11] -1.21 0.00 0.08 -1.39 -1.27 -1.21 -1.15 -1.05 2839 1 log_lik[12] -1.16 0.00 0.10 -1.37 -1.22 -1.15 -1.09 -0.98 2774 1 log_lik[13] -1.22 0.00 0.10 -1.44 -1.28 -1.21 -1.14 -1.03 2784 1 log_lik[14] -1.81 0.00 0.20 -2.24 -1.94 -1.80 -1.67 -1.46 2828 1 log_lik[15] -1.14 0.00 0.11 -1.40 -1.20 -1.13 -1.07 -0.96 2934 1 log_lik[16] -1.12 0.00 0.09 -1.32 -1.17 -1.11 -1.06 -0.96 2917 1 log_lik[17] -1.07 0.00 0.08 -1.24 -1.12 -1.07 -1.02 -0.93 2768 1 log_lik[18] -1.07 0.00 0.08 -1.23 -1.12 -1.07 -1.02 -0.93 2581 1 log_lik[19] -1.07 0.00 0.08 -1.24 -1.12 -1.07 -1.02 -0.92 2737 1 log_lik[20] -1.06 0.00 0.07 -1.21 -1.11 -1.06 -1.01 -0.92 2809 1 log_lik[21] -1.98 0.00 0.22 -2.45 -2.12 -1.96 -1.82 -1.58 2405 1 log_lik[22] -1.16 0.00 0.12 -1.44 -1.22 -1.15 -1.08 -0.97 2216 1 log_lik[23] -1.17 0.00 0.11 -1.42 -1.23 -1.16 -1.09 -0.98 2231 1 log_lik[24] -1.07 0.00 0.08 -1.25 -1.13 -1.07 -1.02 -0.92 2452 1 log_lik[25] -1.24 0.00 0.10 -1.45 -1.30 -1.23 -1.17 -1.07 2104 1 log_lik[26] -1.37 0.00 0.12 -1.63 -1.45 -1.36 -1.29 -1.16 2896 1 log_lik[27] -1.09 0.00 0.08 -1.24 -1.14 -1.09 -1.04 -0.94 2487 1 log_lik[28] -1.27 0.00 0.13 -1.55 -1.35 -1.26 -1.18 -1.06 2881 1 log_lik[29] -2.30 0.01 0.31 -2.98 -2.50 -2.28 -2.08 -1.75 2682 1 log_lik[30] -1.40 0.00 0.12 -1.67 -1.48 -1.39 -1.32 -1.18 2285 1 log_lik[31] -1.22 0.00 0.09 -1.40 -1.27 -1.21 -1.16 -1.06 2306 1 log_lik[32] -1.06 0.00 0.07 -1.21 -1.11 -1.06 -1.01 -0.92 2518 1 log_lik[33] -2.19 0.00 0.27 -2.77 -2.36 -2.18 -2.01 -1.73 2877 1 log_lik[34] -2.26 0.00 0.24 -2.78 -2.42 -2.25 -2.09 -1.83 2785 1 log_lik[35] -1.07 0.00 0.08 -1.22 -1.12 -1.07 -1.02 -0.93 2802 1 log_lik[36] -3.41 0.01 0.37 -4.20 -3.64 -3.39 -3.15 -2.74 2146 1 log_lik[37] -1.53 0.00 0.21 -2.00 -1.65 -1.51 -1.38 -1.18 2829 1 log_lik[38] -1.22 0.00 0.13 -1.50 -1.29 -1.20 -1.13 -1.01 2842 1 log_lik[39] -1.86 0.00 0.15 -2.17 -1.95 -1.85 -1.75 -1.59 2213 1 log_lik[40] -1.32 0.00 0.11 -1.55 -1.39 -1.31 -1.24 -1.12 2662 1 log_lik[41] -1.58 0.00 0.13 -1.85 -1.67 -1.58 -1.49 -1.35 2326 1 log_lik[42] -2.93 0.01 0.36 -3.71 -3.15 -2.91 -2.68 -2.30 2290 1 log_lik[43] -1.43 0.00 0.14 -1.73 -1.51 -1.42 -1.32 -1.18 2520 1 log_lik[44] -1.09 0.00 0.08 -1.26 -1.14 -1.08 -1.03 -0.94 2925 1 log_lik[45] -1.31 0.00 0.13 -1.60 -1.39 -1.30 -1.22 -1.09 2624 1 log_lik[46] -1.16 0.00 0.09 -1.36 -1.22 -1.16 -1.10 -1.00 2348 1 log_lik[47] -1.47 0.00 0.17 -1.84 -1.57 -1.45 -1.34 -1.18 2530 1 log_lik[48] -1.23 0.00 0.13 -1.52 -1.31 -1.22 -1.14 -1.03 2654 1 log_lik[49] -2.42 0.00 0.24 -2.94 -2.57 -2.40 -2.24 -2.00 2770 1 log_lik[50] -1.07 0.00 0.08 -1.23 -1.12 -1.07 -1.02 -0.93 2752 1 log_lik[51] -1.78 0.00 0.21 -2.21 -1.91 -1.76 -1.63 -1.42 2456 1 log_lik[52] -4.28 0.01 0.65 -5.68 -4.69 -4.24 -3.82 -3.13 2861 1 log_lik[53] -1.33 0.00 0.18 -1.75 -1.43 -1.31 -1.21 -1.05 2878 1 log_lik[54] -1.30 0.00 0.15 -1.64 -1.38 -1.28 -1.19 -1.05 2271 1 log_lik[55] -1.59 0.00 0.19 -2.01 -1.70 -1.58 -1.46 -1.28 2922 1 log_lik[56] -1.07 0.00 0.08 -1.23 -1.12 -1.07 -1.02 -0.92 2743 1 log_lik[57] -1.07 0.00 0.08 -1.22 -1.12 -1.07 -1.02 -0.92 2799 1 log_lik[58] -1.29 0.00 0.15 -1.62 -1.38 -1.27 -1.19 -1.05 2840 1 log_lik[59] -1.52 0.00 0.16 -1.86 -1.62 -1.51 -1.41 -1.25 2484 1 log_lik[60] -3.72 0.01 0.59 -5.02 -4.09 -3.68 -3.30 -2.69 2836 1 log_lik[61] -1.14 0.00 0.11 -1.40 -1.20 -1.13 -1.07 -0.96 2936 1 log_lik[62] -1.63 0.00 0.20 -2.07 -1.75 -1.61 -1.49 -1.30 2649 1 log_lik[63] -1.30 0.00 0.11 -1.55 -1.37 -1.29 -1.23 -1.10 2260 1 log_lik[64] -2.03 0.01 0.31 -2.71 -2.23 -2.01 -1.81 -1.52 2685 1 log_lik[65] -2.47 0.01 0.33 -3.19 -2.67 -2.44 -2.24 -1.90 2972 1 log_lik[66] -1.21 0.00 0.10 -1.43 -1.27 -1.20 -1.14 -1.03 2514 1 log_lik[67] -1.81 0.00 0.27 -2.43 -1.97 -1.78 -1.61 -1.36 2935 1 log_lik[68] -1.75 0.00 0.18 -2.15 -1.87 -1.74 -1.62 -1.44 2830 1 log_lik[69] -1.18 0.00 0.10 -1.40 -1.24 -1.18 -1.11 -1.01 2569 1 log_lik[70] -1.85 0.00 0.22 -2.33 -1.99 -1.84 -1.69 -1.47 2684 1 log_lik[71] -1.07 0.00 0.07 -1.22 -1.12 -1.07 -1.02 -0.93 2542 1 log_lik[72] -1.30 0.00 0.14 -1.61 -1.39 -1.28 -1.20 -1.07 2787 1 log_lik[73] -1.11 0.00 0.09 -1.30 -1.16 -1.11 -1.05 -0.95 2444 1 log_lik[74] -1.09 0.00 0.09 -1.28 -1.14 -1.08 -1.03 -0.94 2834 1 log_lik[75] -1.25 0.00 0.10 -1.47 -1.32 -1.24 -1.17 -1.06 2795 1 log_lik[76] -1.46 0.00 0.10 -1.67 -1.52 -1.46 -1.39 -1.28 2265 1 log_lik[77] -2.93 0.01 0.37 -3.73 -3.17 -2.91 -2.66 -2.29 2806 1 log_lik[78] -2.23 0.01 0.30 -2.86 -2.42 -2.21 -2.02 -1.70 2431 1 log_lik[79] -2.47 0.01 0.32 -3.19 -2.67 -2.44 -2.25 -1.91 2812 1 log_lik[80] -1.54 0.00 0.14 -1.84 -1.62 -1.52 -1.44 -1.29 2938 1 log_lik[81] -1.07 0.00 0.08 -1.23 -1.12 -1.07 -1.02 -0.92 2751 1 log_lik[82] -1.54 0.00 0.13 -1.82 -1.62 -1.53 -1.45 -1.31 2380 1 log_lik[83] -1.11 0.00 0.09 -1.30 -1.17 -1.11 -1.06 -0.96 2357 1 log_lik[84] -1.45 0.00 0.22 -1.97 -1.58 -1.42 -1.29 -1.10 2814 1 log_lik[85] -1.22 0.00 0.13 -1.51 -1.29 -1.20 -1.13 -1.02 2712 1 log_lik[86] -1.27 0.00 0.12 -1.52 -1.34 -1.26 -1.19 -1.07 2698 1 log_lik[87] -1.84 0.00 0.24 -2.36 -2.00 -1.82 -1.67 -1.42 2501 1 log_lik[88] -1.17 0.00 0.10 -1.37 -1.23 -1.16 -1.10 -1.00 2797 1 log_lik[89] -2.54 0.00 0.26 -3.07 -2.71 -2.52 -2.36 -2.08 2900 1 log_lik[90] -1.19 0.00 0.11 -1.45 -1.26 -1.18 -1.11 -1.01 2699 1 log_lik[91] -1.42 0.00 0.17 -1.80 -1.53 -1.41 -1.31 -1.14 2488 1 log_lik[92] -1.31 0.00 0.13 -1.59 -1.39 -1.29 -1.21 -1.09 2640 1 log_lik[93] -1.17 0.00 0.08 -1.34 -1.22 -1.17 -1.12 -1.02 2268 1 log_lik[94] -1.29 0.00 0.12 -1.55 -1.36 -1.28 -1.21 -1.08 2884 1 log_lik[95] -4.17 0.01 0.62 -5.49 -4.55 -4.13 -3.75 -3.06 2957 1 log_lik[96] -1.72 0.00 0.16 -2.05 -1.82 -1.70 -1.60 -1.43 2421 1 log_lik[97] -1.07 0.00 0.08 -1.22 -1.12 -1.07 -1.02 -0.93 2860 1 log_lik[98] -1.11 0.00 0.09 -1.31 -1.17 -1.11 -1.05 -0.96 2847 1 log_lik[99] -1.54 0.00 0.15 -1.86 -1.63 -1.53 -1.44 -1.28 2492 1 log_lik[100] -1.38 0.00 0.12 -1.63 -1.45 -1.37 -1.30 -1.17 2828 1 lp__ -62.83 0.03 1.45 -66.53 -63.52 -62.53 -61.76 -61.01 2476 1 Samples were drawn using NUTS(diag_e) at Thu Aug 17 15:32:58 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
Multiplicative model
Define the data list
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor matrix (X)
- the number of predictor variables (nX)
- the total number of observed items (n)
X = model.matrix(~cx1 * cx2, data = data) data.list <- with(data, list(Y = y, X = X, nX = ncol(X), n = nrow(data)))
Define the MCMC chain parameters
Next we should define the behavioural parameters of the No-U-Turn sampling chains. Include the following:
- the nodes (estimated parameters) to monitor (return samples for)
- the number of MCMC chains (3)
- the number of burnin steps (1000)
- the thinning factor (10)
- the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains
nChains = 3 burnInSteps = 1000 thinSteps = 3 numSavedSteps = 3000 #across all chains nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains) nIter
[1] 4000
Fit the model
Now compile and run the Stan code via the rstan interface. Note that the first time jags is run after the rstan package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
During the warmup stage, the No-U-Turn sampler (NUTS) attempts to determine the optimum stepsize - the stepsize that achieves the target acceptance rate (0.8 or 80% by default) without divergence (occurs when the stepsize is too large relative to the curvature of the log posterior - and results in approximations that are likely to diverge and be biased) and without hitting the maximum treedepth (10). At each iteration of the NUTS algorithm, the number of leapfrog steps doubles (as it increases the treedepth) and only terminates when either the NUTS criterion are satisfied or the tree depth reaches the maximum (10 by default).
## load the rstan package library(rstan)
data.rstan.mult <- stan(data = data.list, model_code = modelString, chains = nChains, iter = nIter, warmup = burnInSteps, thin = thinSteps, save_dso = TRUE)
SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 1). Gradient evaluation took 1.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.075668 seconds (Warm-up) 0.14075 seconds (Sampling) 0.216418 seconds (Total) SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 2). Gradient evaluation took 2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.058992 seconds (Warm-up) 0.139282 seconds (Sampling) 0.198274 seconds (Total) SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 3). Gradient evaluation took 1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.062121 seconds (Warm-up) 0.147839 seconds (Sampling) 0.20996 seconds (Total)
data.rstan.mult
Inference for Stan model: e7d4e08d9f1bbcf9d6d79f126f9e56c7. 3 chains, each with iter=4000; warmup=1000; thin=3; post-warmup draws per chain=1000, total post-warmup draws=3000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 2.94 0.01 0.51 1.96 2.60 2.94 3.29 3.95 2830 1 beta[2] 1.34 0.01 0.43 0.51 1.05 1.33 1.62 2.19 2755 1 beta[3] 2.70 0.02 1.24 0.28 1.88 2.70 3.52 5.17 2701 1 cbeta0 3.82 0.00 0.11 3.60 3.75 3.82 3.90 4.05 2854 1 sigma 1.13 0.00 0.08 0.98 1.07 1.12 1.18 1.31 3000 1 beta0 3.67 0.00 0.14 3.40 3.58 3.67 3.76 3.94 2515 1 lp__ -60.99 0.03 1.67 -65.06 -61.83 -60.64 -59.76 -58.82 2502 1 Samples were drawn using NUTS(diag_e) at Tue Aug 15 15:21:02 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
The STAN team has put together pre-compiled modules (functions) to make specifying and applying STAN models much simpler. Each function offers a consistent interface that is Also reminiscent of major frequentist linear modelling routines in R.
Whilst it is not necessary to specify priors when using rstanarm functions (as defaults will be generated), there is no guarantee that the routines for determining these defaults will persist over time. Furthermore, it is always better to define your own priors if for no other reason that it forces you to thing about what you re doing. Consistent with the pure STAN version, we will employ the following priors:
- weakly informative Gaussian prior for the intercept $\beta_0 \sim{} N(0, 100)$
- weakly informative Gaussian prior for the treatment effect $\beta_1 \sim{} N(0, 100)$
- half-cauchy prior for the variance $\sigma \sim{} Cauchy(0, 5)$
Note, I am using the refresh=0 option so as to suppress the larger regular output in the interest of keeping output to what is necessary for this tutorial. When running outside of a tutorial context, the regular verbose output is useful as it provides a way to gauge progress.
Additive model
library(rstanarm) library(broom) library(coda)
data.rstanarm.add = stan_glm(y ~ cx1 + cx2, data = data, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = normal(0, 100), prior_aux = cauchy(0, 2))
Gradient evaluation took 3.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds. Adjust your expectations accordingly! Elapsed Time: 0.198223 seconds (Warm-up) 0.678626 seconds (Sampling) 0.876849 seconds (Total) Gradient evaluation took 1.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds. Adjust your expectations accordingly! Elapsed Time: 0.202665 seconds (Warm-up) 0.536397 seconds (Sampling) 0.739062 seconds (Total) Gradient evaluation took 1.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. Adjust your expectations accordingly! Elapsed Time: 0.160257 seconds (Warm-up) 0.56306 seconds (Sampling) 0.723317 seconds (Total)
print(data.rstanarm.add)
stan_glm family: gaussian [identity] formula: y ~ cx1 + cx2 ------ Estimates: Median MAD_SD (Intercept) 3.8 0.1 cx1 3.0 0.5 cx2 1.4 0.4 sigma 1.1 0.1 Sample avg. posterior predictive distribution of y (X = xbar): Median MAD_SD mean_PPD 3.8 0.2 ------ For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data.rstanarm.add, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 3.825270 0.11429998 3.6128940 4.066603 2 cx1 3.021411 0.50685407 1.9964692 3.978835 3 cx2 1.381486 0.43201840 0.5227742 2.206361 4 sigma 1.148550 0.08466719 0.9874055 1.319120
Multiplicative model
data.rstanarm.mult = stan_glm(y ~ cx1 * cx2, data = data, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = normal(0, 100), prior_aux = cauchy(0, 2))
Gradient evaluation took 3.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds. Adjust your expectations accordingly! Elapsed Time: 0.112049 seconds (Warm-up) 0.580318 seconds (Sampling) 0.692367 seconds (Total) Gradient evaluation took 1.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. Adjust your expectations accordingly! Elapsed Time: 0.059281 seconds (Warm-up) 0.420321 seconds (Sampling) 0.479602 seconds (Total) Gradient evaluation took 1.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds. Adjust your expectations accordingly! Elapsed Time: 0.210373 seconds (Warm-up) 0.70337 seconds (Sampling) 0.913743 seconds (Total)
print(data.rstanarm.mult)
stan_glm family: gaussian [identity] formula: y ~ cx1 * cx2 ------ Estimates: Median MAD_SD (Intercept) 3.7 0.1 cx1 2.9 0.5 cx2 1.3 0.4 cx1:cx2 2.7 1.2 sigma 1.1 0.1 Sample avg. posterior predictive distribution of y (X = xbar): Median MAD_SD mean_PPD 3.8 0.2 ------ For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data.rstanarm.mult, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 3.668833 0.13766864 3.3846348 3.921644 2 cx1 2.933656 0.49365987 1.9342004 3.875018 3 cx2 1.335035 0.41932624 0.5170813 2.187972 4 cx1:cx2 2.703573 1.23876604 0.3719395 5.211188 5 sigma 1.128580 0.08523699 0.9605405 1.293866
The brms package serves a similar goal to the rstanarm package - to provide a simple user interface to STAN. However, unlike the rstanarm implementation, brms simply converts the formula, data, priors and family into STAN model code and data before executing stan with those elements.
Whilst it is not necessary to specify priors when using brms functions (as defaults will be generated), there is no guarantee that the routines for determining these defaults will persist over time. Furthermore, it is always better to define your own priors if for no other reason that it forces you to thing about what you are doing. Consistent with the pure STAN version, we will employ the following priors:
- weakly informative Gaussian prior for the intercept $\beta_0 \sim{} N(0, 100)$
- weakly informative Gaussian prior for the treatment effect $\beta_1 \sim{} N(0, 100)$
- half-cauchy prior for the variance $\sigma \sim{} Cauchy(0, 5)$
Note, I am using the refresh=0. option so as to suppress the larger regular output in the interest of keeping output to what is necessary for this tutorial. When running outside of a tutorial context, the regular verbose output is useful as it provides a way to gauge progress.
Additive model
library(brms) library(broom) library(coda)
data.brms.add = brm(y ~ cx1 + cx2, data = data, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), prior(normal(0, 100), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 2.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds. Adjust your expectations accordingly! Elapsed Time: 0.013242 seconds (Warm-up) 0.072156 seconds (Sampling) 0.085398 seconds (Total) Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Elapsed Time: 0.012669 seconds (Warm-up) 0.068201 seconds (Sampling) 0.08087 seconds (Total) Gradient evaluation took 2.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. Adjust your expectations accordingly! Elapsed Time: 0.013731 seconds (Warm-up) 0.065669 seconds (Sampling) 0.0794 seconds (Total)
print(data.brms.add)
Family: gaussian(identity) Formula: y ~ cx1 + cx2 Data: data (Number of observations: 100) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 ICs: LOO = Not computed; WAIC = Not computed Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 3.82 0.12 3.59 4.04 2700 1 cx1 3.04 0.52 2.04 4.08 2427 1 cx2 1.38 0.44 0.48 2.24 2463 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.15 0.08 0.99 1.33 2405 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data.brms.add, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 3.818513 0.11580124 3.6098266 4.062734 2 b_cx1 3.042654 0.51964636 2.0187823 4.042107 3 b_cx2 1.380825 0.44437714 0.5271996 2.258048 4 sigma 1.147234 0.08407413 0.9784836 1.302088
Multiplicative model
data.brms.mult = brm(y ~ cx1 * cx2, data = data, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), prior(normal(0, 100), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 1.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds. Adjust your expectations accordingly! Elapsed Time: 0.023432 seconds (Warm-up) 0.10201 seconds (Sampling) 0.125442 seconds (Total) Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Elapsed Time: 0.021031 seconds (Warm-up) 0.075103 seconds (Sampling) 0.096134 seconds (Total) Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Elapsed Time: 0.019726 seconds (Warm-up) 0.07707 seconds (Sampling) 0.096796 seconds (Total)
print(data.brms.mult)
Family: gaussian(identity) Formula: y ~ cx1 * cx2 Data: data (Number of observations: 100) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 ICs: LOO = Not computed; WAIC = Not computed Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 3.67 0.13 3.42 3.94 2534 1 cx1 2.92 0.49 1.94 3.91 2700 1 cx2 1.35 0.43 0.54 2.19 2528 1 cx1:cx2 2.66 1.27 0.11 5.16 2411 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.12 0.08 0.98 1.3 2585 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data.brms.mult, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 3.671568 0.13285306 3.43612342 3.943035 2 b_cx1 2.923696 0.49283728 1.89910831 3.857202 3 b_cx2 1.353976 0.42606897 0.59537098 2.231514 4 b_cx1:cx2 2.659722 1.26670106 0.04200689 5.067596 5 sigma 1.123292 0.08287744 0.97447809 1.290851
MCMC diagnostics
In addition to the regular model diagnostic checks (such as residual plots), for Bayesian analyses, it is necessary to explore the characteristics of the MCMC chains and the sampler in general. Recall that the purpose of MCMC sampling is to replicate the posterior distribution of the model likelihood and priors by drawing a known number of samples from this posterior (thereby formulating a probability distribution). This is only reliable if the MCMC samples accurately reflect the posterior.
Unfortunately, since we only know the posterior in the most trivial of circumstances, it is necessary to rely on indirect measures of how accurately the MCMC samples are likely to reflect the likelihood. I will breifly outline the most important diagnostics, however, please refer to Tutorial 4.3, Secton 3.1: Markov Chain Monte Carlo sampling for a discussion of these diagnostics.
- Traceplots for each parameter illustrate the MCMC sample values after each successive
iteration along the chain. Bad chain mixing (characterized by any sort of pattern) suggests
that the MCMC sampling chains may not have completely traversed all features of the posterior
distribution and that more iterations are required to ensure the distribution has been accurately
represented.
- Autocorrelation plot for each paramter illustrate the degree of correlation between
MCMC samples separated by different lags. For example, a lag of 0 represents the degree of
correlation between each MCMC sample and itself (obviously this will be a correlation of 1).
A lag of 1 represents the degree of correlation between each MCMC sample and the next sample along the Chain
and so on. In order to be able to generate unbiased estimates of parameters, the MCMC samples should be
independent (uncorrelated). In the figures below, this would be violated in the top autocorrelation plot and met in the bottom
autocorrelation plot.
- Rhat statistic for each parameter provides a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05. If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.
Prior to inspecting any summaries of the parameter estimates, it is prudent to inspect a range of chain convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model.
- Trace plots
View trace plots
library(MCMCpack) plot(data.mcmcpack.mult)
- Raftery diagnostic
View Raftery diagnostic
library(MCMCpack) raftery.diag(data.mcmcpack.mult)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) (Intercept) 2 3834 3746 1.020 cx1 2 3834 3746 1.020 cx2 2 3650 3746 0.974 cx1:cx2 2 3680 3746 0.982 sigma2 2 3710 3746 0.990
- Autocorrelation diagnostic
View autocorrelations
library(MCMCpack) autocorr.diag(data.mcmcpack.mult)
(Intercept) cx1 cx2 cx1:cx2 sigma2 Lag 0 1.000000000 1.0000000000 1.00000000 1.0000000000 1.000000000 Lag 1 0.005947655 0.0005726118 -0.03747573 -0.0009129132 0.026538088 Lag 5 0.004657780 0.0004847322 0.01289516 -0.0066712267 -0.001533080 Lag 10 -0.014139624 -0.0055672420 -0.01006061 0.0089793972 0.006885994 Lag 50 -0.005874175 -0.0037488027 0.01129924 -0.0125086637 0.018083774
Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model.
library(coda) data.mcmc = as.mcmc(data.r2jags.mult)
- Trace plots
plot(data.mcmc)
When there are a lot of parameters, this can result in a very large number of traceplots. To focus on just certain parameters (such as $\beta$s)
preds <- c("beta0", "beta[1]", "beta[2]") plot(as.mcmc(data.r2jags.mult)[, preds])
- Raftery diagnostic
raftery.diag(data.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 35610 3746 9.51 beta[1] 20 36200 3746 9.66 beta[2] 20 38660 3746 10.30 beta[3] 20 37410 3746 9.99 deviance 20 38660 3746 10.30 sigma 20 37410 3746 9.99 [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 38030 3746 10.20 beta[1] 20 39950 3746 10.70 beta[2] 20 36800 3746 9.82 beta[3] 20 38030 3746 10.20 deviance 20 38030 3746 10.20 sigma 10 37410 3746 9.99 [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 37410 3746 9.99 beta[1] 20 37410 3746 9.99 beta[2] 20 38660 3746 10.30 beta[3] 20 38030 3746 10.20 deviance 20 37410 3746 9.99 sigma 20 36800 3746 9.82
- Autocorrelation diagnostic
autocorr.diag(data.mcmc)
beta0 beta[1] beta[2] beta[3] deviance sigma Lag 0 1.000000000 1.000000000 1.000000e+00 1.0000000000 1.000000000 1.000000000 Lag 10 -0.003559426 0.008596973 -6.002651e-04 0.0009684002 -0.011764207 0.003113188 Lag 50 0.007319792 -0.001489343 -1.080306e-02 0.0012955417 -0.008450594 0.005073925 Lag 100 -0.015299928 0.001915914 5.393375e-05 0.0082757392 0.001525074 0.017887984 Lag 500 0.002792247 0.015376731 8.762674e-03 0.0065778151 -0.009101738 0.010133970
Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) s = as.array(data.rstan.mult) wch = grep("beta", dimnames(s)$parameters) s = s[, , wch] mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) plot(mcmc)
library(coda) s = as.array(data.rstan.mult) wch = grep("beta", dimnames(s)$parameters) s = s[, , wch] mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) autocorr.diag(mcmc)
beta[1] beta[2] beta[3] cbeta0 Lag 0 1.000000000 1.00000000 1.000000000 1.0000000000 Lag 1 0.006483455 0.02858605 0.033585670 0.0182224466 Lag 5 -0.008761706 0.01281171 -0.013686626 0.0147104379 Lag 10 0.005060245 -0.01900523 -0.015181308 -0.0212298669 Lag 50 -0.028188168 -0.01532768 0.005026116 0.0007438316
- via rstan
- Traceplots
stan_trace(data.rstan.mult)
- Raftery diagnostic
raftery.diag(data.rstan.mult)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data.rstan.mult)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data.rstan.mult)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data.rstan.mult)
- Traceplots
- via bayesplot
- Trace plots and density plots
library(bayesplot) mcmc_trace(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
library(bayesplot) mcmc_combo(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
- Density plots
library(bayesplot) mcmc_dens(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
- Trace plots and density plots
- via shinystan
library(shinystan) launch_shinystan(data.rstan.mult)
- It is worth exploring the influence of our priors.
Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) s = as.array(data.rstanarm.mult) mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) plot(mcmc)
library(coda) s = as.array(data.rstanarm.mult) mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) autocorr.diag(mcmc)
(Intercept) cx1 cx2 cx1:cx2 Lag 0 1.00000000 1.000000000 1.000000000 1.000000000 Lag 1 0.20527489 0.003647371 0.039860083 -0.020218444 Lag 5 -0.04746531 0.008339269 -0.016665899 -0.024428764 Lag 10 0.01484256 0.033294835 0.002029290 0.027973299 Lag 50 -0.01554411 0.012833299 -0.005277095 0.002660752
- via rstan
- Traceplots
stan_trace(data.rstanarm.mult)
- Raftery diagnostic
raftery.diag(data.rstanarm.mult)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data.rstanarm.mult)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data.rstanarm.mult)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data.rstanarm.mult)
- Traceplots
- via bayesplot
- Trace plots and density plots
library(bayesplot) mcmc_trace(as.array(data.rstanarm.mult), regex_pars = "Intercept|x|sigma")
library(bayesplot) mcmc_combo(as.array(data.rstanarm.mult))
- Density plots
library(bayesplot) mcmc_dens(as.array(data.rstanarm.mult))
- Trace plots and density plots
- via rstanarm
The rstanarm package provides additional posterior checks.- Posterior vs Prior - this compares the posterior estimate for each parameter against the associated prior.
If the spread of the priors is small relative to the posterior, then it is likely that the priors are too influential.
On the other hand, overly wide priors can lead to computational issues.
library(rstanarm) posterior_vs_prior(data.rstanarm.mult, color_by = "vs", group_by = TRUE, facet_args = list(scales = "free_y"))
Gradient evaluation took 6.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.65 seconds. Adjust your expectations accordingly! Elapsed Time: 0.103395 seconds (Warm-up) 0.112204 seconds (Sampling) 0.215599 seconds (Total) Gradient evaluation took 1.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. Adjust your expectations accordingly! Elapsed Time: 0.155166 seconds (Warm-up) 0.139936 seconds (Sampling) 0.295102 seconds (Total)
- Posterior vs Prior - this compares the posterior estimate for each parameter against the associated prior.
If the spread of the priors is small relative to the posterior, then it is likely that the priors are too influential.
On the other hand, overly wide priors can lead to computational issues.
- via shinystan
library(shinystan) launch_shinystan(data.rstanarm.mult))
- It is worth exploring the influence of our priors.
Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) mcmc = as.mcmc(data.brms.mult) plot(mcmc)
library(coda) mcmc = as.mcmc(data.brms.mult) autocorr.diag(mcmc)
Error in ts(x, start = start(x), end = end(x), deltat = thin(x)): invalid time series parameters specified
- via rstan
- Traceplots
stan_trace(data.brms.mult$fit)
- Raftery diagnostic
raftery.diag(data.brms.mult)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data.brms.mult$fit)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data.brms.mult$fit)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data.brms.mult$fit)
- Traceplots
Model validation
Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.
For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.
There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.
Unfortunately the definitions of standardized and studentized residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:
Standardized residuals: | the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known). | |
Studentized residuals: | the raw residuals divided by the standard deviation of the residuals. Note that externally studentized residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out. | |
Pearson residuals: | the raw residuals divided by the standard deviation of the response variable. |
The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.
The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.
Residuals are not computed directly within MCMCpack. However, we can calculate them manually form the posteriors.
mcmc = as.data.frame(data.mcmcpack.mult) # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = as.data.frame(data.mcmcpack.mult) # generate a model matrix newdata = newdata Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cx1:cx2) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
And now for studentized residuals
mcmc = as.data.frame(data.mcmcpack.mult) # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = as.matrix(data.mcmcpack.mult) # generate a model matrix Xmat = model.matrix(~cx1 * cx2, data) ## get median parameter estimates coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i, ], sqrt(mcmc[i, "sigma2"]))) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = data, aes(x = y, fill = "Obs"), alpha = 0.5)
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.mcmcpack.mult), regex_pars = "Intercept|cx|sigma")
mcmc_areas(as.matrix(data.mcmcpack.mult), regex_pars = "Intercept|cx|sigma")
Residuals are not computed directly within JAGS. However, we can calculate them manually form the posteriors.
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0, contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0, contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = newdata Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cx1:cx2) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
And now for studentized residuals
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0, contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0, contains("beta"), sigma) %>% as.matrix # generate a model matrix Xmat = model.matrix(~cx1 * cx2, data) ## get median parameter estimates coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i, ], mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = data, aes(x = y, fill = "Obs"), alpha = 0.5)
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(data.r2jags.mult$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
mcmc_areas(data.r2jags.mult$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
Residuals are not computed directly within RSTAN. However, we can calculate them manually form the posteriors.
mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = newdata Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cx1:cx2) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
And now for studentized residuals
mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix Xmat = model.matrix(~cx1 * cx2, data) ## get median parameter estimates coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i, ], mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = data, aes(x = y, fill = "Obs"), alpha = 0.5)
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
mcmc_areas(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
Residuals can be computed directly within RSTANARM.
resid = resid(data.rstanarm.mult) fit = fitted(data.rstanarm.mult) ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
resid = resid(data.rstanarm.mult) dat.melt = data %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cx1:cx2) ggplot(dat.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
And now for studentized residuals
resid = resid(data.rstanarm.mult) sresid = resid/sd(resid) fit = fitted(data.rstanarm.mult) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets compare draws (predictions) from the posterior (in blue) with the distributions of observed data (red) via violin plots. Violin plots are similar to boxplots except that they display more visual information about the density distribution.
y_pred = posterior_predict(data.rstanarm.mult) newdata = data %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -y:-cx2) newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cx1:cx2) data.melt = data %>% gather(key = "Pred", value = "Pred_val", cx1:cx2) ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) + geom_violin(data = data.melt, aes(y = y, x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred)
Yet another way to approach validation is to explore trends in posteriors in the context of the observed data.
## Calculate the fitted values newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100))) fit = posterior_predict(data.rstanarm.mult, newdata = newdata) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cx1:cx2) %>% filter(Value != 0) data.melt = data %>% gather(key = "Pred", value = "Value", cx1:cx2) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = data.melt, aes(y = y)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.rstanarm.mult), regex_pars = "Intercept|cx|sigma")
mcmc_areas(as.matrix(data.rstanarm.mult), regex_pars = "Intercept|cx|sigma")
Residuals can be computed directly within BRMS. By default, the residuals and fitted extractor functions in brms return summarized versions (means, SE and credibility intervals). We are only interested in the mean (Estimate) estimates.
resid = resid(data.brms.mult)[, "Estimate"] fit = fitted(data.brms.mult)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
resid = resid(data.brms.mult)[, "Estimate"] dat.melt = data %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cx1:cx2) ggplot(dat.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
And now for studentized residuals
resid = resid(data.brms.mult)[, "Estimate"] sresid = resid/sd(resid) fit = fitted(data.brms.mult)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets compare draws (predictions) from the posterior (in blue) with the distributions of observed data (red) via violin plots. Violin plots are similar to boxplots except that they display more visual information about the density distribution.
y_pred = posterior_predict(data.brms.mult) newdata = data %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -y:-cx2) newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cx1:cx2) data.melt = data %>% gather(key = "Pred", value = "Pred_val", cx1:cx2) ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) + geom_violin(data = data.melt, aes(y = y, x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred)
Yet another way to approach validation is to explore trends in posteriors in the context of the observed data.
## Calculate the fitted values newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100))) fit = posterior_predict(data.brms.mult, newdata = newdata) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cx1:cx2) %>% filter(Value != 0) data.melt = data %>% gather(key = "Pred", value = "Value", cx1:cx2) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = data.melt, aes(y = y)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.brms.mult), regex_pars = "b_|sigma")
mcmc_areas(as.matrix(data.brms.mult), regex_pars = "b_|sigma")
Parameter estimates (posterior summaries)
Although all parameters in a Bayesian analysis are considered random and are considered a distribution, rarely would it be useful to present tables of all the samples from each distribution. On the other hand, plots of the posterior distributions are do have some use. Nevertheless, most workers prefer to present simple statistical summaries of the posteriors. Popular choices include the median (or mean) and 95% credibility intervals.
library(coda) mcmcpvalue <- function(samp) { ## elementary version that creates an empirical p-value for the ## hypothesis that the columns of samp have mean zero versus a general ## multivariate distribution with elliptical contours. ## differences from the mean standardized by the observed ## variance-covariance factor ## Note, I put in the bit for single terms if (length(dim(samp)) == 0) { std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - mean(samp), transpose = TRUE) sqdist <- colSums(std * std) sum(sqdist[-1] > sqdist[1])/length(samp) } else { std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp), transpose = TRUE) sqdist <- colSums(std * std) sum(sqdist[-1] > sqdist[1])/nrow(samp) } }
Additive model (MCMCpack)
summary(data.mcmcpack.add)
Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 3.823 0.1140 0.001140 0.001145 cx1 3.011 0.5045 0.005045 0.005045 cx2 1.397 0.4354 0.004354 0.004419 sigma2 1.306 0.1921 0.001921 0.001968 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 3.6041 3.746 3.823 3.899 4.046 cx1 2.0282 2.671 3.018 3.345 4.010 cx2 0.5306 1.106 1.399 1.693 2.252 sigma2 0.9839 1.168 1.286 1.419 1.734
# OR library(broom) tidyMCMC(data.mcmcpack.add, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 3.822627 0.1139509 3.6048428 4.046831 2 cx1 3.011318 0.5044755 2.0435576 4.015099 3 cx2 1.396664 0.4353955 0.5283247 2.245911 4 sigma2 1.305520 0.1920807 0.9492838 1.677171
- when cx2 is held constant, a one unit increase in cx1 is associated with a
3.011318
change in y. That is, y increases at a rate of3.011318
per unit increase in cx1 when standardized for cx2. - when cx1 is held constant, a one unit increase in cx2 is associated with a
1.396664
change in y. That is, y increases at a rate of1.396664
per unit increase in cx2 when standardized for cx1.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data.mcmcpack.add[, 2])
[1] 0
mcmcpvalue(data.mcmcpack.add[, 3])
[1] 0.0019
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Multiplicative model (MCMCpack)
summary(data.mcmcpack.mult)
Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 3.671 0.1330 0.001330 0.001330 cx1 2.928 0.4994 0.004994 0.004994 cx2 1.342 0.4194 0.004194 0.004040 cx1:cx2 2.667 1.2398 0.012398 0.012398 sigma2 1.256 0.1843 0.001843 0.001877 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 3.4147 3.581 3.672 3.759 3.938 cx1 1.9555 2.597 2.929 3.260 3.910 cx2 0.4986 1.072 1.343 1.622 2.161 cx1:cx2 0.2512 1.828 2.669 3.495 5.080 sigma2 0.9460 1.125 1.238 1.369 1.660
# OR library(broom) tidyMCMC(data.mcmcpack.mult, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 3.671103 0.1330200 3.4112189 3.933302 2 cx1 2.928409 0.4993563 1.9354791 3.878062 3 cx2 1.341735 0.4193631 0.4912293 2.150196 4 cx1:cx2 2.667311 1.2398248 0.3215486 5.133875 5 sigma2 1.256300 0.1843042 0.9336364 1.632568
- at the average level of cx2 (=0), a one unit increase in cx1 is associated with a
2.9284093
change in y. That is, y increases at a rate of2.9284093
per unit increase in cx1 when standardized for cx2. - at the average level of cx1 (=0), a one unit increase in cx2 is associated with a
1.3417354
change in y. That is, y increases at a rate of1.3417354
per unit increase in cx2 when standardized for cx1. - the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of
cx2 (and vice versa) is
2.6673106
.
The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data.mcmcpack.mult[, 2])
[1] 0
mcmcpvalue(data.mcmcpack.mult[, 3])
[1] 0.0022
mcmcpvalue(data.mcmcpack.mult[, 4])
[1] 0.0311
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Additive model (JAGS)
print(data.r2jags.add)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 3.028 0.501 2.037 2.692 3.032 3.365 4.009 1.001 15000 beta[2] 1.389 0.432 0.545 1.101 1.394 1.675 2.235 1.001 15000 beta0 3.823 0.115 3.597 3.745 3.822 3.901 4.048 1.001 8000 sigma 1.146 0.083 1.001 1.088 1.140 1.198 1.322 1.001 15000 deviance 309.526 2.883 305.925 307.427 308.888 310.918 316.903 1.002 3100 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 4.2 and DIC = 313.7 DIC is an estimate of expected predictive error (lower deviance is better).
# OR library(broom) tidyMCMC(as.mcmc(data.r2jags.add), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 beta0 3.822567 0.11519912 3.6077546 4.057015 2 beta[1] 3.028129 0.50095089 2.0517612 4.018021 3 beta[2] 1.388584 0.43169505 0.5412069 2.229200 4 deviance 309.526490 2.88347247 305.5279931 315.076927 5 sigma 1.145686 0.08280216 0.9979548 1.318215
- when cx2 is held constant, a one unit increase in cx1 is associated with a
3.0281293
change in y. That is, y increases at a rate of3.0281293
per unit increase in cx1 when standardized for cx2. - when cx1 is held constant, a one unit increase in cx2 is associated with a
1.388584
change in y. That is, y increases at a rate of1.388584
per unit increase in cx2 when standardized for cx1.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data.r2jags.add$BUGSoutput$sims.matrix[, "beta[1]"])
[1] 0
mcmcpvalue(data.r2jags.add$BUGSoutput$sims.matrix[, "beta[2]"])
[1] 0.001866667
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Multiplicative model (JAGS)
print(data.r2jags.mult)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 2.931 0.499 1.969 2.593 2.929 3.265 3.902 1.001 15000 beta[2] 1.344 0.426 0.508 1.053 1.343 1.631 2.181 1.001 15000 beta[3] 2.675 1.256 0.182 1.845 2.669 3.509 5.158 1.001 15000 beta0 3.671 0.134 3.413 3.580 3.671 3.760 3.932 1.001 12000 sigma 1.126 0.082 0.981 1.070 1.120 1.176 1.307 1.001 8000 deviance 305.828 3.280 301.532 303.441 305.137 307.457 314.045 1.001 15000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 5.4 and DIC = 311.2 DIC is an estimate of expected predictive error (lower deviance is better).
# OR library(broom) tidyMCMC(as.mcmc(data.r2jags.mult), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 beta0 3.670651 0.13361307 3.4145748 3.933344 2 beta[1] 2.930546 0.49865289 1.9650810 3.893332 3 beta[2] 1.344297 0.42580446 0.5242358 2.194335 4 beta[3] 2.675152 1.25623900 0.1764777 5.139000 5 deviance 305.827579 3.27978099 301.0348960 312.277506 6 sigma 1.126052 0.08177241 0.9662253 1.287832
- at the average level of cx2 (=0), a one unit increase in cx1 is associated with a
2.9305463
change in y. That is, y increases at a rate of2.9305463
per unit increase in cx1 when standardized for cx2. - at the average level of cx1 (=0), a one unit increase in cx2 is associated with a
1.3442967
change in y. That is, y increases at a rate of1.3442967
per unit increase in cx2 when standardized for cx1. - the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of
cx2 (and vice versa) is
2.6751525
.
The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data.r2jags.mult$BUGSoutput$sims.matrix[, "beta[1]"])
[1] 0
mcmcpvalue(data.r2jags.mult$BUGSoutput$sims.matrix[, "beta[2]"])
[1] 0.001933333
mcmcpvalue(data.r2jags.mult$BUGSoutput$sims.matrix[, "beta[3]"])
[1] 0.0366
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Additive model (RSTAN)
print(data.rstan.add, pars = c("beta0", "beta", "sigma"))
Inference for Stan model: 4e0a54cd22440d9847d7705bdb1ff803. 3 chains, each with iter=4000; warmup=1000; thin=3; post-warmup draws per chain=1000, total post-warmup draws=3000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta0 3.83 0.00 0.12 3.59 3.75 3.83 3.91 4.06 2317 1 beta[1] 3.02 0.01 0.51 2.03 2.69 3.02 3.34 4.05 2747 1 beta[2] 1.38 0.01 0.43 0.52 1.10 1.39 1.68 2.22 2854 1 sigma 1.15 0.00 0.08 0.99 1.09 1.14 1.20 1.32 2806 1 Samples were drawn using NUTS(diag_e) at Thu Aug 17 15:32:58 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
# OR library(broom) tidyMCMC(data.rstan.add, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta0", "beta", "sigma"))
term estimate std.error conf.low conf.high 1 beta0 3.826110 0.11727211 3.5953559 4.058548 2 beta[1] 3.022736 0.50695556 2.0124349 4.027372 3 beta[2] 1.384733 0.43249804 0.5363925 2.228436 4 sigma 1.145670 0.08415868 0.9808821 1.301693
- when cx2 is held constant, a one unit increase in cx1 is associated with a
1.3847333
change in y. That is, y increases at a rate of1.3847333
per unit increase in cx1 when standardized for cx2. - when cx1 is held constant, a one unit increase in cx2 is associated with a
3.82611
change in y. That is, y increases at a rate of3.82611
per unit increase in cx2 when standardized for cx1.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data.rstan.add)[, "beta[1]"])
[1] 0
mcmcpvalue(as.matrix(data.rstan.add)[, "beta[2]"])
[1] 0.0006666667
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Multiplicative model (RSTAN)
print(data.rstan.mult, pars = c("beta0", "beta", "sigma"))
Inference for Stan model: 4e0a54cd22440d9847d7705bdb1ff803. 3 chains, each with iter=4000; warmup=1000; thin=3; post-warmup draws per chain=1000, total post-warmup draws=3000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta0 3.67 0.00 0.14 3.40 3.58 3.67 3.76 3.94 2515 1 beta[1] 2.94 0.01 0.51 1.96 2.60 2.94 3.29 3.95 2830 1 beta[2] 1.34 0.01 0.43 0.51 1.05 1.33 1.62 2.19 2755 1 beta[3] 2.70 0.02 1.24 0.28 1.88 2.70 3.52 5.17 2701 1 sigma 1.13 0.00 0.08 0.98 1.07 1.12 1.18 1.31 3000 1 Samples were drawn using NUTS(diag_e) at Wed Aug 16 16:06:53 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
# OR library(broom) tidyMCMC(data.rstan.mult, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta0", "beta", "sigma"))
term estimate std.error conf.low conf.high 1 beta0 3.667709 0.13512216 3.3976772 3.923658 2 beta[1] 2.944087 0.50916320 1.9886013 3.974140 3 beta[2] 1.335580 0.43089574 0.4927849 2.175954 4 beta[3] 2.695545 1.23745247 0.2529844 5.066406 5 sigma 1.127368 0.08492246 0.9646372 1.292453
- at the average level of cx2 (=0), a one unit increase in cx1 is associated with a
1.3355795
change in y. That is, y increases at a rate of1.3355795
per unit increase in cx1 when standardized for cx2. - at the average level of cx1 (=0), a one unit increase in cx2 is associated with a
2.6955452
change in y. That is, y increases at a rate of2.6955452
per unit increase in cx2 when standardized for cx1. - the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of
cx2 (and vice versa) is
3.8218391
.
The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data.rstan.mult)[, "beta[1]"])
[1] 0
mcmcpvalue(as.matrix(data.rstan.mult)[, "beta[2]"])
[1] 0.001666667
mcmcpvalue(as.matrix(data.rstan.mult)[, "beta[3]"])
[1] 0.032
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
An alternative way of quantifying the impact of an interaction is to compare models with and without
the interactions. In a Bayesian context, this can be achieved by comparing the leave-one-out
cross-validation statistics. Leave-one-out (LOO) cross-validation explores how well a series of models
can predict withheld values (Vehtari, Gelman, and Gabry, 2016b)
. The LOO Information Criterion (LOOIC) is analogous to the AIC except that
the LOOIC takes priors into consideration, does not assume that the posterior distribution is drawn from
a multivariate normal and integrates over parameter uncertainty so as to yield a distribution of looic
rather than just a point estimate. The LOOIC does however assume that all observations are equally
influential (it does not matter which observations are left out). This assumption can be examined via the
Pareta k estimate (values greater than 0.5 or more conservatively 0.75 are considered overly influential).
library(loo) (full = loo(extract_log_lik(data.rstan.mult)))
Computed from 3000 by 100 log-likelihood matrix Estimate SE elpd_loo -155.3 6.2 p_loo 4.4 0.7 looic 310.5 12.4 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
(reduced = loo(extract_log_lik(data.rstan.add)))
Computed from 3000 by 100 log-likelihood matrix Estimate SE elpd_loo -156.8 6.9 p_loo 3.8 0.7 looic 313.6 13.7 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Additive model (RSTANARM)
summary(data.rstanarm.add)
Model Info: function: stan_glm family: gaussian [identity] formula: y ~ cx1 + cx2 algorithm: sampling priors: see help('prior_summary') sample: 2700 (posterior sample size) num obs: 100 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 3.8 0.1 3.6 3.7 3.8 3.9 4.1 cx1 3.0 0.5 2.0 2.7 3.0 3.4 4.0 cx2 1.4 0.4 0.5 1.1 1.4 1.7 2.2 sigma 1.1 0.1 1.0 1.1 1.1 1.2 1.3 mean_PPD 3.8 0.2 3.5 3.7 3.8 3.9 4.1 log-posterior -165.0 1.5 -168.8 -165.8 -164.6 -163.9 -163.2 Diagnostics: mcse Rhat n_eff (Intercept) 0.0 1.0 1591 cx1 0.0 1.0 2700 cx2 0.0 1.0 2700 sigma 0.0 1.0 1641 mean_PPD 0.0 1.0 2101 log-posterior 0.0 1.0 1335 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# OR library(broom) tidyMCMC(data.rstanarm.add$stanfit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 3.825270 0.11429998 3.6128940 4.066603 2 cx1 3.021411 0.50685407 1.9964692 3.978835 3 cx2 1.381486 0.43201840 0.5227742 2.206361 4 sigma 1.148550 0.08466719 0.9874055 1.319120 5 mean_PPD 3.823862 0.15921259 3.5193260 4.137062 6 log-posterior -164.988433 1.48402493 -167.9667057 -162.954149
- when cx2 is held constant, a one unit increase in cx1 is associated with a
3.021411
change in y. That is, y increases at a rate of3.021411
per unit increase in cx1 when standardized for cx2. - when cx1 is held constant, a one unit increase in cx2 is associated with a
1.3814856
change in y. That is, y increases at a rate of1.3814856
per unit increase in cx2 when standardized for cx1.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data.rstanarm.add)[, "cx1"])
[1] 0
mcmcpvalue(as.matrix(data.rstanarm.add)[, "cx2"])
[1] 0.001481481
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Multiplicative model (RSTANARM)
summary(data.rstanarm.mult)
Model Info: function: stan_glm family: gaussian [identity] formula: y ~ cx1 * cx2 algorithm: sampling priors: see help('prior_summary') sample: 2700 (posterior sample size) num obs: 100 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 3.7 0.1 3.4 3.6 3.7 3.8 3.9 cx1 2.9 0.5 1.9 2.6 2.9 3.3 3.9 cx2 1.3 0.4 0.5 1.1 1.3 1.6 2.2 cx1:cx2 2.7 1.2 0.3 1.9 2.7 3.5 5.1 sigma 1.1 0.1 1.0 1.1 1.1 1.2 1.3 mean_PPD 3.8 0.2 3.5 3.7 3.8 3.9 4.1 log-posterior -164.1 1.7 -168.2 -164.9 -163.7 -162.8 -161.9 Diagnostics: mcse Rhat n_eff (Intercept) 0.0 1.0 1784 cx1 0.0 1.0 2661 cx2 0.0 1.0 2474 cx1:cx2 0.0 1.0 2700 sigma 0.0 1.0 1134 mean_PPD 0.0 1.0 2086 log-posterior 0.0 1.0 1374 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# OR library(broom) tidyMCMC(data.rstanarm.mult$stanfit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 3.668833 0.13766864 3.3846348 3.921644 2 cx1 2.933656 0.49365987 1.9342004 3.875018 3 cx2 1.335035 0.41932624 0.5170813 2.187972 4 cx1:cx2 2.703573 1.23876604 0.3719395 5.211188 5 sigma 1.128580 0.08523699 0.9605405 1.293866 6 mean_PPD 3.823219 0.15952737 3.5108288 4.130149 7 log-posterior -164.059918 1.68184862 -167.3759390 -161.738628
- at the average level of cx2 (=0), a one unit increase in cx1 is associated with a
2.9336564
change in y. That is, y increases at a rate of2.9336564
per unit increase in cx1 when standardized for cx2. - at the average level of cx1 (=0), a one unit increase in cx2 is associated with a
1.3350352
change in y. That is, y increases at a rate of1.3350352
per unit increase in cx2 when standardized for cx1. - the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of
cx2 (and vice versa) is
2.7035727
.
The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data.rstanarm.mult)[, "cx1"])
[1] 0
mcmcpvalue(as.matrix(data.rstanarm.mult)[, "cx2"])
[1] 0.002222222
mcmcpvalue(as.matrix(data.rstanarm.mult)[, "cx1:cx2"])
[1] 0.03185185
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Alternatively we can generate posterior intervals for each parameter.
posterior_interval(data.rstanarm.mult, prob = 0.95)
2.5% 97.5% (Intercept) 3.3933953 3.940445 cx1 1.9391715 3.879278 cx2 0.5196027 2.194059 cx1:cx2 0.2548959 5.109150 sigma 0.9753235 1.314384
An alternative way of quantifying the impact of an interaction is to compare models with and without
the interactions. In a Bayesian context, this can be achieved by comparing the leave-one-out
cross-validation statistics. Leave-one-out (LOO) cross-validation explores how well a series of models
can predict withheld values (Vehtari, Gelman, and Gabry, 2016b)
. The LOO Information Criterion (LOOIC) is analogous to the AIC except that
the LOOIC takes priors into consideration, does not assume that the posterior distribution is drawn from
a multivariate normal and integrates over parameter uncertainty so as to yield a distribution of looic
rather than just a point estimate. The LOOIC does however assume that all observations are equally
influential (it does not matter which observations are left out). This assumption can be examined via the
Pareta k estimate (values greater than 0.5 or more conservatively 0.75 are considered overly influential).
(full = loo(data.rstanarm.mult))
Computed from 2700 by 100 log-likelihood matrix Estimate SE elpd_loo -155.3 6.2 p_loo 4.4 0.7 looic 310.5 12.4 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
(reduced = loo(data.rstanarm.add))
Computed from 2700 by 100 log-likelihood matrix Estimate SE elpd_loo -156.8 6.8 p_loo 3.8 0.7 looic 313.6 13.7 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
elpd_diff se -1.5 1.8
Additive model (BRMS)
summary(data.brms.add)
Family: gaussian(identity) Formula: y ~ cx1 + cx2 Data: data (Number of observations: 100) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 ICs: LOO = Not computed; WAIC = Not computed Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 3.82 0.12 3.59 4.04 2700 1 cx1 3.04 0.52 2.04 4.08 2427 1 cx2 1.38 0.44 0.48 2.24 2463 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.15 0.08 0.99 1.33 2405 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
# OR library(broom) tidyMCMC(data.brms.add$fit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 3.818513 0.11580124 3.6098266 4.062734 2 b_cx1 3.042654 0.51964636 2.0187823 4.042107 3 b_cx2 1.380825 0.44437714 0.5271996 2.258048 4 sigma 1.147234 0.08407413 0.9784836 1.302088
- when cx2 is held constant, a one unit increase in cx1 is associated with a
3.0426537
change in y. That is, y increases at a rate of3.0426537
per unit increase in cx1 when standardized for cx2. - when cx1 is held constant, a one unit increase in cx2 is associated with a
1.3808245
change in y. That is, y increases at a rate of1.3808245
per unit increase in cx2 when standardized for cx1.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data.brms.add)[, "b_cx1"])
[1] 0
mcmcpvalue(as.matrix(data.brms.add)[, "b_cx2"])
[1] 0.002592593
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Multiplicative model (BRMS)
summary(data.brms.mult)
Family: gaussian(identity) Formula: y ~ cx1 * cx2 Data: data (Number of observations: 100) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 ICs: LOO = Not computed; WAIC = Not computed Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 3.67 0.13 3.42 3.94 2534 1 cx1 2.92 0.49 1.94 3.91 2700 1 cx2 1.35 0.43 0.54 2.19 2528 1 cx1:cx2 2.66 1.27 0.11 5.16 2411 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.12 0.08 0.98 1.3 2585 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
# OR library(broom) tidyMCMC(data.brms.mult$fit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 3.671568 0.13285306 3.43612342 3.943035 2 b_cx1 2.923696 0.49283728 1.89910831 3.857202 3 b_cx2 1.353976 0.42606897 0.59537098 2.231514 4 b_cx1:cx2 2.659722 1.26670106 0.04200689 5.067596 5 sigma 1.123292 0.08287744 0.97447809 1.290851
- at the average level of cx2 (=0), a one unit increase in cx1 is associated with a
2.923696
change in y. That is, y increases at a rate of2.923696
per unit increase in cx1 when standardized for cx2. - at the average level of cx1 (=0), a one unit increase in cx2 is associated with a
1.3539765
change in y. That is, y increases at a rate of1.3539765
per unit increase in cx2 when standardized for cx1. - the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of
cx2 (and vice versa) is
2.6597222
.
The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data.brms.mult)[, "b_cx1"])
[1] 0
mcmcpvalue(as.matrix(data.brms.mult)[, "b_cx2"])
[1] 0.0007407407
mcmcpvalue(as.matrix(data.brms.mult)[, "b_cx1:cx2"])
[1] 0.04
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.
Alternatively we can generate posterior intervals for each parameter.
posterior_interval(as.matrix(data.brms.mult), prob = 0.95)
2.5% 97.5% b_Intercept 3.4249254 3.935567 b_cx1 1.9378137 3.907302 b_cx2 0.5370439 2.190452 b_cx1:cx2 0.1139071 5.162096 sigma 0.9772463 1.299092 lp__ -64.9204721 -58.831482
An alternative way of quantifying the impact of an interaction is to compare models with and without
the interactions. In a Bayesian context, this can be achieved by comparing the leave-one-out
cross-validation statistics. Leave-one-out (LOO) cross-validation explores how well a series of models
can predict withheld values (Vehtari, Gelman, and Gabry, 2016b)
. The LOO Information Criterion (LOOIC) is analogous to the AIC except that
the LOOIC takes priors into consideration, does not assume that the posterior distribution is drawn from
a multivariate normal and integrates over parameter uncertainty so as to yield a distribution of looic
rather than just a point estimate. The LOOIC does however assume that all observations are equally
influential (it does not matter which observations are left out). This assumption can be examined via the
Pareta k estimate (values greater than 0.5 or more conservatively 0.75 are considered overly influential).
(full = loo(data.brms.mult))
LOOIC SE 310.51 12.48
(reduced = loo(data.brms.add))
LOOIC SE 313.62 13.7
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Graphical summaries
A nice graphic is often a great accompaniment to a statistical analysis. Although there are no fixed assumptions associated with graphing (in contrast to statistical analyses), we often want the graphical summaries to reflect the associated statistical analyses. After all, the sample is just one perspective on the population(s). What we are more interested in is being able to estimate and depict likely population parameters/trends.
Thus, whilst we could easily provide a plot displaying the raw data along with simple measures of location and spread, arguably, we should use estimates that reflect the fitted model. In this case, it would be appropriate to plot the credibility interval associated with each group.
Additive model (MCMCpack)
With appropriate use of model matrices and data wrangling, it is possible to produce a single prediction data set along with ggplot() syntax to produce a multi-panel figure.
mcmc = data.mcmcpack.add ## Calculate the fitted values newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), Pred = 2)) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, 1:3] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
We cannot simply add the raw data to this figure. The reason for this is that the trends represent the effect of one predictor holding the other variable constant. Therefore, the observations we represent on the figure must likewise be standardized. We can achieve this by adding the partial residuals to the figure. Partial residuals are the fitted values plus the residuals.
## Calculate partial residuals fitted values fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = data$cx2, Pred = 2)) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid), color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() + facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * .(Pred))) + theme(axis.title.x = element_blank(), strip.background = element_blank(), strip.placement = "outside")
However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.
mcmc = data.mcmcpack.add ## Calculate the fitted values newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, 1:3] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + theme_classic() newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), cx1 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, 1:3] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X2") + theme_classic() grid.arrange(g1, g2, ncol = 2)
Multiplicative model (MCMCpack)
For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.
mcmc = data.mcmcpack.mult ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*% -2:2) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")]) rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") + theme_classic() + theme(strip.background = element_blank())
Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.
mcmc = data.mcmcpack.mult ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y", colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) + scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
Additive model (JAGS)
With appropriate use of model matrices and data wrangling, it is possible to produce a single prediction data set along with ggplot() syntax to produce a multi-panel figure.
mcmc = data.r2jags.add$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), Pred = 2)) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
We cannot simply add the raw data to this figure. The reason for this is that the trends represent the effect of one predictor holding the other variable constant. Therefore, the observations we represent on the figure must likewise be standardized. We can achieve this by adding the partial residuals to the figure. Partial residuals are the fitted values plus the residuals.
## Calculate partial residuals fitted values fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = data$cx2, Pred = 2)) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid), color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() + facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * .(Pred))) + theme(axis.title.x = element_blank(), strip.background = element_blank(), strip.placement = "outside")
However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.
mcmc = data.r2jags.add$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + theme_classic() newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), cx1 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X2") + theme_classic() grid.arrange(g1, g2, ncol = 2)
Multiplicative model (JAGS)
For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*% -2:2) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")]) rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") + theme_classic() + theme(strip.background = element_blank())
Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y", colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) + scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
Additive model (RSTAN)
With appropriate use of model matrices and data wrangling, it is possible to produce a single prediction data set along with ggplot() syntax to produce a multi-panel figure.
mcmc = as.matrix(data.rstan.add) ## Calculate the fitted values newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), Pred = 2)) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
We cannot simply add the raw data to this figure. The reason for this is that the trends represent the effect of one predictor holding the other variable constant. Therefore, the observations we represent on the figure must likewise be standardized. We can achieve this by adding the partial residuals to the figure. Partial residuals are the fitted values plus the residuals.
## Calculate partial residuals fitted values fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = data$cx2, Pred = 2)) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid), color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() + facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * .(Pred))) + theme(axis.title.x = element_blank(), strip.background = element_blank(), strip.placement = "outside")
However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.
mcmc = as.matrix(data.rstan.add) ## Calculate the fitted values newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + theme_classic() newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), cx1 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X2") + theme_classic() grid.arrange(g1, g2, ncol = 2)
Multiplicative model (RSTAN)
For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.
mcmc = as.matrix(data.rstan.mult) ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*% -2:2) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")]) rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") + theme_classic() + theme(strip.background = element_blank())
Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.
mcmc = as.matrix(data.rstan.mult) ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y", colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) + scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
Additive model (RSTANARM)
Although we could calculated the fitted values via matrix multiplication of the coefficients and the model matrix (as for MCMCpack, RJAGS and RSTAN), for more complex models, it is more convenient to use the posterior_linpred function that comes with rstantools.
## Calculate the fitted values newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), Pred = 2)) fit = posterior_linpred(data.rstanarm.add, newdata = newdata) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ## Partial residual rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = data$cx2, Pred = 2)) pp = posterior_linpred(data.rstanarm.add, newdata = rdata) fit = as.vector(apply(pp, 2, median)) resid = resid(data.rstanarm.add) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid), color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() + facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * .(Pred)), sclaes = "free") + theme(axis.title.x = element_blank(), strip.background = element_blank(), strip.placement = "outside")
Error in facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * : unused argument (sclaes = "free")
A more general solution would be to add the partial residuals to the figure. Partial residuals are the fitted values plus the residuals. In this simple case, that equates to exactly the same as the raw observations since $$resid = obs - fitted$$ and the fitted values depend only on the single predictor we are interested in.
mcmc = as.matrix(data.rstanarm.add) newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), Pred = 2)) Xmat = model.matrix(~cx1 + cx2, data = newdata) coefs = mcmc[, c("(Intercept)", "cx1", "cx2")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ## Calculate partial residuals fitted values fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = data$cx2, Pred = 2)) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid), color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() + facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * .(Pred)), scales = "free") + theme(axis.title.x = element_blank(), strip.background = element_blank(), strip.placement = "outside")
However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.
mcmc = as.matrix(data.rstanarm.add) coefs = mcmc[, c("(Intercept)", "cx1", "cx2")] ## Calculate the fitted values newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + theme_classic() newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), cx1 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X2") + theme_classic() grid.arrange(g1, g2, ncol = 2)
Multiplicative model (RSTANARM)
For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.
mcmc = as.matrix(data.rstanarm.mult) ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*% -2:2) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")]) rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") + theme_classic() + theme(strip.background = element_blank())
Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.
mcmc = as.matrix(data.rstanarm.mult) ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y", colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) + scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
Additive model (BRMS)
Although we could calculated the fitted values via matrix multiplication of the coefficients and the model matrix (as for MCMCpack, RJAGS and RSTAN), for more complex models, it is more convenient to use the marginal_effects function that comes with brms.
plot(marginal_effects(data.brms.add), points = TRUE)
# OR eff = plot(marginal_effects(data.brms.add), points = TRUE, plot = FALSE) do.call("grid.arrange", c(eff, nrow = 1))
This is a great way of producing a quick plot. However, notice that the x axes are still on the scale of the centered predictors. A more general solution would be to add the partial residuals to the figure. Partial residuals are the fitted values plus the residuals. In this simple case, that equates to exactly the same as the raw observations since $$resid = obs - fitted$$ and the fitted values depend only on the single predictor we are interested in.
mcmc = as.matrix(data.brms.add) newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), Pred = 2)) Xmat = model.matrix(~cx1 + cx2, data = newdata) coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ## Calculate partial residuals fitted values fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0, cx2 = data$cx2, Pred = 2)) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1, x2)) ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid), color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() + facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * .(Pred)), scales = "free") + theme(axis.title.x = element_blank(), strip.background = element_blank(), strip.placement = "outside")
However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.
mcmc = as.matrix(data.brms.add) coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2")] ## Calculate the fitted values newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + theme_classic() newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100), cx1 = 0) Xmat = model.matrix(~cx1 + cx2, newdata) fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Now the partial residuals fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2) fMat = rMat = model.matrix(~cx1 + cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X2") + theme_classic() grid.arrange(g1, g2, ncol = 2)
Multiplicative model (BRMS)
For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.
We will first explore the simple built in marginal_effects() function.
plot(marginal_effects(data.brms.mult, effects = "cx1:cx2"), points = TRUE)
# OR Define a function that will calculate mean plus or minus # 2 and 1 standard deviations msd2 = function(x) { means = mean(x, na.rm = TRUE) sd = sd(x, na.rm = TRUE) means + (-2:2) * sd } plot(marginal_effects(data.brms.mult, effects = "cx1:cx2", int_conditions = list(cx2 = msd2)), points = TRUE)
# OR we could arrange the effect of cx1 separately for # different values of cx2 (mean plus or minus 1 and 2 # standard deviations) cond = data.frame(cx2 = msd2(data$cx2), row.names = paste0("cx2: mean ", -2:2, "*sd")) plot(marginal_effects(data.brms.mult, effects = "cx1", conditions = cond, select_points = 0.1), points = TRUE)
## Yet another way would be as a 2D surface plot(marginal_effects(data.brms.mult, effects = "cx1:cx2", surface = TRUE), points = TRUE, stype = "raster")
mcmc = as.matrix(data.brms.mult) ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*% -2:2) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")]) rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") + facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") + theme_classic() + theme(strip.background = element_blank())
Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.
mcmc = as.matrix(data.brms.mult) ## Calculate the fitted values newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1, na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE), len = 100)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) * -2:2) fMat = rMat = model.matrix(~cx1 * cx2, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y", colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) + scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
Effect sizes
In addition to deriving the distribution means for the slope parameter, we could make use of the Bayesian framework to derive the distribution of the effect size. In so doing, effect size could be considered as either the rate of change or alternatively, the difference between pairs of values along the predictor gradient. For the latter case, there are multiple ways of calculating an effect size, but the two most common are:
- Raw effect size
- the difference between two groups (as already calculated)
- Cohen's D
- the effect size standardized by division with the pooled standard deviation $$ D = (\mu_A - \mu_B)/\sigma $$
- Percent effect size
- expressing the effect size as a percent of one of the pairs. That is, whether you expressing a percentage increase or a percentage decline depends on which of the pairs of values are considered a reference value. Care must be exercised to ensure no division by zeros occur.
For simple linear models, effect size based on a rate is essentially the same as above except that it is expressed per unit of the predictor. Of course in many instances, one unit change in the predictor represents too subtle a shift in the underlying gradient to likely yield any ecologically meaningful or appreciable change in response.
Lets explore a range of effect sizes:
- Raw effect size between the largest and smallest x
- Cohen's D
- Percentage change between the largest and smallest x
- Fractional change between the largest and smallest x
Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.
For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.
mcmc = data.mcmcpack.mult newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) * sd(data$cx2)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.131538 0.9834683 -0.8617237 2.992687 2 4 2.007505 0.6623440 0.6986248 3.288608 3 6 2.883473 0.4916937 1.9057792 3.818553 4 8 3.759440 0.6135136 2.5552802 4.941969 5 10 4.635408 0.9179371 2.8090733 6.391762
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"]) (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.018302 0.8809005 -0.6266923 2.824726 2 4 1.805760 0.6063423 0.6348380 3.010847 3 6 2.593218 0.4784125 1.6946060 3.568473 4 8 3.380676 0.5997235 2.2250809 4.547363 5 10 4.168134 0.8717915 2.4901356 5.850737
# Percentage change (relative to Group A) ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 60.59249 60.30857 -36.51167 183.4926 2 4 93.25863 40.13721 18.95693 172.6912 3 6 130.69428 37.49792 62.55793 204.8880 4 8 175.48589 68.84283 74.16229 312.6143 5 10 237.13768 368.17308 52.26804 489.9922
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.5051 0.8749 0.9978 0.9985 0.9966
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.605925 0.6030857 0.6348833 2.834926 2 4 1.932586 0.4013721 1.1895693 2.726912 3 6 2.306943 0.3749792 1.6255793 3.048880 4 8 2.754859 0.6884283 1.7416229 4.126143 5 10 3.371377 3.6817308 1.5226804 5.899922
Conclusions:
- On average, when x2 is equal to its mean, Y increases by
2.8834728
over the observed range of x1. We are 95% confident that the increase is between1.9057792
and3.8185531
. - The Cohen's D associated change over the observed range of x1 is
2.5932182
. - On average, Y increases by
130.694%
over the observed range of x1 (at average x2). We are 95% confident that the increase is between62.558%
and204.888%
. - The probability that Y increases by more than 50% over the observed range of x1 (average x2) is
0.998
. - On average, Y increases by a factor of
2.307%
over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of1.626%
and3.049%
.
Lets explore a range of effect sizes:
- Raw effect size between the largest and smallest x
- Cohen's D
- Percentage change between the largest and smallest x
- Fractional change between the largest and smallest x
Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.
For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) * sd(data$cx2)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.128491 0.9881108 -0.8113367 3.042310 2 4 2.007034 0.6622062 0.7082004 3.281786 3 6 2.885577 0.4910011 1.9349269 3.833589 4 8 3.764120 0.6197240 2.5853036 5.011755 5 10 4.642663 0.9313671 2.7349541 6.390688
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.018302 0.8809005 -0.6266923 2.824726 2 4 1.805760 0.6063423 0.6348380 3.010847 3 6 2.593218 0.4784125 1.6946060 3.568473 4 8 3.380676 0.5997235 2.2250809 4.547363 5 10 4.168134 0.8717915 2.4901356 5.850737
# Percentage change (relative to Group A) ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 61.20075 64.56737 -41.40202 180.6583 2 4 93.37902 40.80821 19.36101 172.7273 3 6 130.88092 37.48191 64.85883 205.7658 4 8 176.42530 71.66748 67.72479 311.9882 5 10 262.64539 2612.68958 50.93741 509.6312
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.5030000 0.8716667 0.9978667 0.9990667 0.9964667
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.612008 0.6456737 0.5859798 2.806583 2 4 1.933790 0.4080821 1.1936101 2.727273 3 6 2.308809 0.3748191 1.6485883 3.057658 4 8 2.764253 0.7166748 1.6772479 4.119882 5 10 3.626454 26.1268958 1.5093741 6.096312
Conclusions:
- On average, when x2 is equal to its mean, Y increases by
2.885577
over the observed range of x1. We are 95% confident that the increase is between1.9349269
and3.8335886
. - The Cohen's D associated change over the observed range of x1 is
2.5932182
. - On average, Y increases by
130.881%
over the observed range of x1 (at average x2). We are 95% confident that the increase is between64.859%
and205.766%
. - The probability that Y increases by more than 50% over the observed range of x1 (average x2) is
0.998
. - On average, Y increases by a factor of
2.309%
over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of1.649%
and3.058%
.
Lets explore a range of effect sizes:
- Raw effect size between the largest and smallest x
- Cohen's D
- Percentage change between the largest and smallest x
- Fractional change between the largest and smallest x
Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.
For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.
mcmc = as.matrix(data.rstan.mult) newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) * sd(data$cx1)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.378868 0.8964917 -0.3259671 3.191336 2 4 2.138889 0.6370163 0.8738894 3.372097 3 6 2.898910 0.5013501 1.9580862 3.913157 4 8 3.658931 0.5834243 2.5187370 4.817305 5 10 4.418952 0.8202961 2.7627498 5.970749
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.241077 0.7956905 -0.2242797 2.896785 2 4 1.917148 0.5758933 0.8032340 3.056407 3 6 2.593218 0.4784125 1.6946060 3.568473 4 8 3.269289 0.5699119 2.1779064 4.378729 5 10 3.945360 0.7870300 2.4733235 5.523548
# Percentage change (relative to Group A) ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 69.38721 54.62118 -23.99487 177.2210 2 4 98.87413 39.22694 26.59480 175.7967 3 6 132.15307 38.36353 67.01962 213.2552 4 8 171.39922 67.04252 66.87743 295.4455 5 10 229.60617 482.03370 58.23209 432.2243
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.5980000 0.9163333 0.9973333 0.9993333 0.9973333
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.693872 0.5462118 0.7600513 2.772210 2 4 1.988741 0.3922694 1.2659480 2.757967 3 6 2.321531 0.3836353 1.6701962 3.132552 4 8 2.713992 0.6704252 1.6687743 3.954455 5 10 3.296062 4.8203370 1.5823209 5.322243
Conclusions:
- On average, when x2 is equal to its mean, Y increases by
2.8989101
over the observed range of x1. We are 95% confident that the increase is between1.9580862
and3.9131569
. - The Cohen's D associated change over the observed range of x1 is
2.5932182
. - On average, Y increases by
132.153%
over the observed range of x1 (at average x2). We are 95% confident that the increase is between67.020%
and213.255%
. - The probability that Y increases by more than 50% over the observed range of x1 (average x2) is
0.997
. - On average, Y increases by a factor of
2.322%
over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of1.670%
and3.133%
.
Lets explore a range of effect sizes:
- Raw effect size between the largest and smallest x
- Cohen's D
- Percentage change between the largest and smallest x
- Fractional change between the largest and smallest x
Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.
For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.
mcmc = as.matrix(data.rstanarm.mult) newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) * sd(data$cx1)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.364070 0.8886508 -0.3986485 3.095723 2 4 2.126355 0.6252966 0.9360712 3.408929 3 6 2.888639 0.4860847 1.9045201 3.815556 4 8 3.650924 0.5705679 2.5904201 4.829344 5 10 4.413208 0.8116676 2.8424069 5.966333
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.241077 0.7956905 -0.2242797 2.896785 2 4 1.917148 0.5758933 0.8032340 3.056407 3 6 2.593218 0.4784125 1.6946060 3.568473 4 8 3.269289 0.5699119 2.1779064 4.378729 5 10 3.945360 0.7870300 2.4733235 5.523548
# Percentage change (relative to Group A) ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 68.56476 54.43507 -26.75850 173.5301 2 4 97.87279 38.40045 29.88173 174.5653 3 6 131.01647 36.90149 60.00110 199.9777 4 8 169.96282 62.60247 70.59416 293.3944 5 10 221.68966 160.07276 62.33684 446.4973
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.5988889 0.9155556 0.9974074 0.9992593 0.9981481
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.685648 0.5443507 0.732415 2.735301 2 4 1.978728 0.3840045 1.298817 2.745653 3 6 2.310165 0.3690149 1.600011 2.999777 4 8 2.699628 0.6260247 1.705942 3.933944 5 10 3.216897 1.6007276 1.623368 5.464973
Conclusions:
- On average, when x2 is equal to its mean, Y increases by
2.8886394
over the observed range of x1. We are 95% confident that the increase is between1.9045201
and3.8155558
. - The Cohen's D associated change over the observed range of x1 is
2.5932182
. - On average, Y increases by
131.016%
over the observed range of x1 (at average x2). We are 95% confident that the increase is between60.001%
and199.978%
. - The probability that Y increases by more than 50% over the observed range of x1 (average x2) is
0.997
. - On average, Y increases by a factor of
2.310%
over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of1.600%
and3.000%
.
Lets explore a range of effect sizes:
- Raw effect size between the largest and smallest x
- Cohen's D
- Percentage change between the largest and smallest x
- Fractional change between the largest and smallest x
Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.
For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.
mcmc = as.matrix(data.brms.mult) newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) * sd(data$cx1)) Xmat = model.matrix(~cx1 * cx2, newdata) coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")] fit = coefs %*% t(Xmat) s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.378991 0.9190167 -0.3987054 3.229530 2 4 2.128911 0.6422492 0.7482612 3.300124 3 6 2.878832 0.4852747 1.8699665 3.798013 4 8 3.628753 0.5600127 2.5331742 4.766964 5 10 4.378673 0.8042712 2.8863192 6.043824
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.241077 0.7956905 -0.2242797 2.896785 2 4 1.917148 0.5758933 0.8032340 3.056407 3 6 2.593218 0.4784125 1.6946060 3.568473 4 8 3.269289 0.5699119 2.1779064 4.378729 5 10 3.945360 0.7870300 2.4733235 5.523548
# Percentage change (relative to Group A) ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 70.11175 56.09519 -28.50460 177.9270 2 4 98.42547 39.60222 26.48813 179.2685 3 6 130.18361 36.80878 62.53690 203.1577 4 8 166.87273 58.53950 68.10948 279.8138 5 10 213.35276 125.34446 58.92578 414.6204
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.5988889 0.9074074 0.9985185 1.0000000 0.9985185
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.701117 0.5609519 0.714954 2.779270 2 4 1.984255 0.3960222 1.264881 2.792685 3 6 2.301836 0.3680878 1.625369 3.031577 4 8 2.668727 0.5853950 1.681095 3.798138 5 10 3.133528 1.2534446 1.589258 5.146204
Conclusions:
- On average, when x2 is equal to its mean, Y increases by
2.8788319
over the observed range of x1. We are 95% confident that the increase is between1.8699665
and3.7980128
. - The Cohen's D associated change over the observed range of x1 is
2.5932182
. - On average, Y increases by
130.184%
over the observed range of x1 (at average x2). We are 95% confident that the increase is between62.537%
and203.158%
. - The probability that Y increases by more than 50% over the observed range of x1 (average x2) is
0.999
. - On average, Y increases by a factor of
2.302%
over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of1.625%
and3.032%
.
Posteriors
Finite Population Standard Deviations
Variance components, the amount of added variance attributed to each influence, are traditionally estimated for so called random effects. These are the effects for which the levels employed in the design are randomly selected to represent a broader range of possible levels. For such effects, effect sizes (differences between each level and a reference level) are of little value. Instead, the 'importance' of the variables are measured in units of variance components.
On the other hand, regular variance components for fixed factors (those whose measured levels represent the only levels of interest) are not logical - since variance components estimate variance as if the levels are randomly selected from a larger population. Nevertheless, in order to compare and contrast the scale of variability of both fixed and random factors, it is necessary to measure both on the same scale (sample or population based variance).
Finite-population variance components (Gelman, 2005)
assume that the levels of all
factors (fixed and random) in the design are all the possible levels available.
In other words, they are assumed to represent finite populations of levels.
Sample (rather than population) statistics are then used to calculate these
finite-population variances (or standard deviations).
Since standard deviation (and variance) are bound at zero, standard deviation posteriors are typically non-normal. Consequently, medians and HPD intervals are more robust estimates.
library(broom) mcmc = data.mcmcpack.mult Xmat = model.matrix(~cx1 * cx2, data = data) sd.x1 = abs(mcmc[, "cx1"]) * sd(Xmat[, "cx1"]) sd.x2 = abs(mcmc[, "cx2"]) * sd(Xmat[, "cx2"]) sd.x1x2 = abs(mcmc[, "cx1:cx2"]) * sd(Xmat[, "cx1:cx2"]) sd.x = sd.x1 + sd.x2 + sd.x1x2 # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 0.8385457 0.14298995 0.55422160 1.1104773 2 sd.x2 0.4476210 0.13949494 0.16383800 0.7171473 3 sd.x1x2 0.2463365 0.11067311 0.02945066 0.4554639 4 sd.resid 1.1106423 0.01421401 1.09355250 1.1387527
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 31.816643 5.126155 21.433069 41.56264 2 sd.x2 16.976661 5.066772 6.574339 26.59877 3 sd.x1x2 9.300278 3.880973 1.426584 16.43793 4 sd.resid 41.898287 2.536467 37.537311 47.24753
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 58.1%
of the total finite population standard deviation is due to x, x2 and their interaction.
mcmc = data.r2jags.mult$BUGSoutput$sims.matrix Xmat = model.matrix(~cx1 * cx2, data = data) sd.x1 = abs(mcmc[, "beta[1]"]) * sd(Xmat[, "cx1"]) sd.x2 = abs(mcmc[, "beta[2]"]) * sd(Xmat[, "cx2"]) sd.x1x2 = abs(mcmc[, "beta[3]"]) * sd(Xmat[, "cx1:cx2"]) sd.x = sd.x1 + sd.x2 + sd.x1x2 # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 0.8391576 0.14278852 0.562698059 1.1148498 2 sd.x2 0.4484404 0.14175853 0.174846530 0.7318690 3 sd.x1x2 0.2471354 0.11203956 0.005098644 0.4380983 4 sd.resid 1.1109865 0.01441035 1.093559848 1.1388625
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 31.78126 5.102172 21.6079476 41.43618 2 sd.x2 17.00764 5.135077 6.8692354 26.99571 3 sd.x1x2 9.29281 3.926346 0.7970709 16.21722 4 sd.resid 41.87848 2.540247 37.4649686 47.15805
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 58.1%
of the total finite population standard deviation is due to x1, x2 and their interaction.
mcmc = as.matrix(data.rstan.mult) Xmat = model.matrix(~cx1 * cx2, data = data) sd.x1 = abs(mcmc[, "beta[1]"]) * sd(Xmat[, "cx1"]) sd.x2 = abs(mcmc[, "beta[2]"]) * sd(Xmat[, "cx2"]) sd.x1x2 = abs(mcmc[, "beta[3]"]) * sd(Xmat[, "cx1:cx2"]) sd.x = sd.x1 + sd.x2 + sd.x1x2 # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 0.8430350 0.1457981 0.56943304 1.1379891 2 sd.x2 0.4455197 0.1435024 0.16435682 0.7257383 3 sd.x1x2 0.2486064 0.1111387 0.01755885 0.4479512 4 sd.resid 1.1110181 0.0144178 1.09365822 1.1400552
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 32.023023 5.259373 21.2694513 41.69414 2 sd.x2 16.829913 5.180634 6.4358085 26.67465 3 sd.x1x2 9.386689 3.882299 0.9254482 16.13724 4 sd.resid 41.924201 2.522040 37.2404261 46.86077
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 58.2%
of the total finite population standard deviation is due to x1, x2 and their interaction.
mcmc = as.matrix(data.rstanarm.mult) Xmat = model.matrix(~cx1 * cx2, data = data) sd.x1 = abs(mcmc[, "cx1"]) * sd(Xmat[, "cx1"]) sd.x2 = abs(mcmc[, "cx2"]) * sd(Xmat[, "cx2"]) sd.x1x2 = abs(mcmc[, "cx1:cx2"]) * sd(Xmat[, "cx1:cx2"]) sd.x = sd.x1 + sd.x2 + sd.x1x2 # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 0.8400482 0.14135878 0.55385543 1.1096057 2 sd.x2 0.4453660 0.13954895 0.17246030 0.7297465 3 sd.x1x2 0.2495511 0.11078771 0.03275807 0.4602081 4 sd.resid 1.1105389 0.01428812 1.09360418 1.1390562
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 31.854366 5.075864 21.965836 41.59395 2 sd.x2 16.876390 5.018949 7.012028 26.68624 3 sd.x1x2 9.352654 3.874954 1.516877 16.38607 4 sd.resid 41.903312 2.489766 37.960454 47.38992
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 58.1%
of the total finite population standard deviation is due to x1, x2 and their interaction.
mcmc = as.matrix(data.brms.mult) Xmat = model.matrix(~cx1 * cx2, data = data) sd.x1 = abs(mcmc[, "b_cx1"]) * sd(Xmat[, "cx1"]) sd.x2 = abs(mcmc[, "b_cx2"]) * sd(Xmat[, "cx2"]) sd.x1x2 = abs(mcmc[, "b_cx1:cx2"]) * sd(Xmat[, "cx1:cx2"]) sd.x = sd.x1 + sd.x2 + sd.x1x2 # generate a model matrix newdata = data Xmat = model.matrix(~cx1 * cx2, newdata) ## get median parameter estimates coefs = mcmc[, c("(Intercept)", "b_cx1", "b_cx2", "b_cx1:cx2")]
Error in mcmc[, c("(Intercept)", "b_cx1", "b_cx2", "b_cx1:cx2")]: subscript out of bounds
fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 0.8371960 0.14112323 0.543806866 1.1045040 2 sd.x2 0.4515870 0.14210529 0.198571997 0.7442691 3 sd.x1x2 0.2458671 0.11272300 0.003856768 0.4310691 4 sd.resid 1.1105389 0.01428812 1.093604181 1.1390562
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.x1 31.725295 5.188656 21.6538435 41.74478 2 sd.x2 16.987658 5.079056 7.2092581 26.68051 3 sd.x1x2 9.270729 3.962796 0.3883622 15.83973 4 sd.resid 42.019970 2.593114 37.2740427 47.27606
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 58.0%
of the total finite population standard deviation is due to x1, x2 and their interaction.
$R^2$
In a frequentist context, the $R^2$ value is seen as a useful indicator of goodness of fit. Whilst it has long been acknowledged that this measure is not appropriate for comparing models (for such purposes information criterion such as AIC are more appropriate), it is nevertheless useful for estimating the amount (percent) of variance explained by the model.
In a frequentist context, $R^2$ is calculated as the variance in predicted values divided by the variance in the observed (response) values.
Unfortunately, this classical formulation does not translate simply into a Bayesian context since
the equivalently calculated numerator can be larger than the an equivalently calculated denominator - thereby resulting in an $R^2$
greater than 100%. Gelman, Goodrich, Gabry, and Ali (2017)
proposed an alternative
formulation in which the denominator comprises the sum of the explained variance and the variance of the residuals.
So in the standard regression model notation of: $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma)\\ \mu_i =& \mathbf{X}\boldsymbol{\beta} \end{align} $$ The $R^2$ could be formulated as: $$ R^2 = \frac{\sigma^2_f}{\sigma^2_f + \sigma^2_e} $$ where $\sigma^2_f = var(\mu)$, ($\mu = \mathbf{X}\boldsymbol{\beta})$) and for Gaussian models $\sigma^2_e = var(y-\mu)$
library(broom) mcmc <- data.mcmcpack.mult Xmat = model.matrix(~cx1 * cx2, data) coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5514478 0.04547871 0.4617169 0.6348432
# for comparison with frequentist summary(lm(y ~ cx1 * cx2, data))
Call: lm(formula = y ~ cx1 * cx2, data = data) Residuals: Min 1Q Median 3Q Max -2.34877 -0.85435 0.06905 0.71265 2.57068 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.6710 0.1315 27.924 < 2e-16 *** cx1 2.9292 0.4914 5.961 4.15e-08 *** cx2 1.3445 0.4207 3.196 0.00189 ** cx1:cx2 2.6651 1.2305 2.166 0.03281 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.111 on 96 degrees of freedom Multiple R-squared: 0.5577, Adjusted R-squared: 0.5439 F-statistic: 40.35 on 3 and 96 DF, p-value: < 2.2e-16
library(broom) mcmc <- data.r2jags.mult$BUGSoutput$sims.matrix Xmat = model.matrix(~cx1 * cx2, data) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.55194 0.04571563 0.460811 0.6348946
# for comparison with frequentist summary(lm(y ~ cx1 * cx2, data))
Call: lm(formula = y ~ cx1 * cx2, data = data) Residuals: Min 1Q Median 3Q Max -2.34877 -0.85435 0.06905 0.71265 2.57068 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.6710 0.1315 27.924 < 2e-16 *** cx1 2.9292 0.4914 5.961 4.15e-08 *** cx2 1.3445 0.4207 3.196 0.00189 ** cx1:cx2 2.6651 1.2305 2.166 0.03281 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.111 on 96 degrees of freedom Multiple R-squared: 0.5577, Adjusted R-squared: 0.5439 F-statistic: 40.35 on 3 and 96 DF, p-value: < 2.2e-16
library(broom) mcmc <- as.matrix(data.rstan.mult) Xmat = model.matrix(~cx1 * cx2, data) coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5527234 0.04505697 0.4577562 0.6298844
# for comparison with frequentist summary(lm(y ~ cx1 * cx2, data))
Call: lm(formula = y ~ cx1 * cx2, data = data) Residuals: Min 1Q Median 3Q Max -2.34877 -0.85435 0.06905 0.71265 2.57068 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.6710 0.1315 27.924 < 2e-16 *** cx1 2.9292 0.4914 5.961 4.15e-08 *** cx2 1.3445 0.4207 3.196 0.00189 ** cx1:cx2 2.6651 1.2305 2.166 0.03281 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.111 on 96 degrees of freedom Multiple R-squared: 0.5577, Adjusted R-squared: 0.5439 F-statistic: 40.35 on 3 and 96 DF, p-value: < 2.2e-16
library(broom) mcmc <- as.matrix(data.rstanarm.mult) Xmat = model.matrix(~cx1 * cx2, data) coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5517803 0.04514623 0.462438 0.6348679
# for comparison with frequentist summary(lm(y ~ cx1 * cx2, data))
Call: lm(formula = y ~ cx1 * cx2, data = data) Residuals: Min 1Q Median 3Q Max -2.34877 -0.85435 0.06905 0.71265 2.57068 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.6710 0.1315 27.924 < 2e-16 *** cx1 2.9292 0.4914 5.961 4.15e-08 *** cx2 1.3445 0.4207 3.196 0.00189 ** cx1:cx2 2.6651 1.2305 2.166 0.03281 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.111 on 96 degrees of freedom Multiple R-squared: 0.5577, Adjusted R-squared: 0.5439 F-statistic: 40.35 on 3 and 96 DF, p-value: < 2.2e-16
library(broom) mcmc <- as.matrix(data.brms.mult) Xmat = model.matrix(~cx1 * cx2, data) coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5519288 0.04630979 0.4565537 0.6331142
# for comparison with frequentist summary(lm(y ~ cx1 * cx2, data))
Call: lm(formula = y ~ cx1 * cx2, data = data) Residuals: Min 1Q Median 3Q Max -2.34877 -0.85435 0.06905 0.71265 2.57068 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.6710 0.1315 27.924 < 2e-16 *** cx1 2.9292 0.4914 5.961 4.15e-08 *** cx2 1.3445 0.4207 3.196 0.00189 ** cx1:cx2 2.6651 1.2305 2.166 0.03281 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.111 on 96 degrees of freedom Multiple R-squared: 0.5577, Adjusted R-squared: 0.5439 F-statistic: 40.35 on 3 and 96 DF, p-value: < 2.2e-16
Bayesian model selection (Sparsity)
A statistical model is by definition a low-dimensional (=over simplification) representation of what is really likely to be a very complex system. As a result, no model is right. Some models however can provide useful insights into some of the processes operating on the system.
Frequentist statistics have various methods (Model selection, dredging, lasso, cross validation) for selecting parsimonious models. These are models that provide a good comprimise between minimizing unexplained patterns and minimizing model complexity. The basic premise is that since no model can hope to capture the full complexity of a system with all its subtleties, only the very major patterns can be estimated. Overly complex models are likely to be representing artificial complexity present only in the specific observed data (not the general population).
The Bayesian approach is to apply priors to the non-variance parameters such that parameters close to zero are further shrunk towards zero whilst priors on parameters further away from zero are less effected. The most popular form of prior for sparsity is the horseshoe prior, so called because the shape of a component of this prior resembles a horseshoe (with most of the mass either close to 0 or close to 1).
Rather than apply weakly informative Gaussian priors on parameters as: $$ \beta_j \sim{} N(0, \sigma^2) $$ the horseshoe prior is defined as: $$ \begin{align} \beta_j &\sim{} N(0, \tau^2\lambda_j^2)\\ \tau &\sim{} cauchy(0,1)\\ \lambda_j &\sim{} cauchy(0,1),\hspace{0.5cm}j=1,...,D\\ \end{align} $$ where $D$ is the number of (non-intercept or variance) parameters. $\tau$ represents the global scale that weights or shrinks all parameters towards zero and $\lambda_j$ are thick tailed local scales that allow some of the $j$ parameters to escape shrinkage.
More recently, Piironen and Vehtari (2017)
have argued that whilst the above horseshoe priors do guarantee that strong effects (parameters) will not be over-shrunk,
there is the potential for weekly identified effects (those based on relatively little data) to be misrepresented in the posteriors. As an
alternative Piironen and Vehtari (2017)
advocated the use of regularized horseshoe priors in which the amount of shrinkage applied to the largest effects can be
controlled.
$$ \begin{align} \beta_j &\sim{} N(0, \tau^2\tilde{\lambda_j}^2)\\ \tau &\sim{} cauchy(0,1)\\ \tilde{\lambda_j}^2 &=\frac{c^2\lambda_j^2}{c^2+\tau^2\lambda_j^2}\\ \lambda_j &\sim{} cauchy(0,1),\hspace{0.5cm}j=1,...,D\\ \end{align} $$ where $c$ (slab width, actually variance) is a constant. For small effects (when $\tau^2\lambda_j^2 \ll c^2$), the prior approaches a regular prior. However, for large effects (when $\tau^2\lambda_j^2 \gg c^2$), the prior approaches $N(0,c^2)$.
.. recommend applying a inverse-gamma prior on $c^2$ $$ \begin{align} c^2 &\sim{} Inv-Gamma(\alpha,\beta)\\ \alpha &=v/2\\ \beta &=vs^2/2\\ \end{align} $$
Not available for MCMCpack
Not available for JAGS
The following code is a slight modification of the code presented in Appendix C.1 of Piironen and Vehtari (2017)
.
data { int < lower =0 > n; # number of observations int < lower =0 > nX; # number of predictors vector [ n] Y; # outputs matrix [n ,nX] X; # inputs real < lower =0 > scale_icept ; # prior std for the intercept real < lower =0 > scale_global ; # scale for the half -t prior for tau real < lower =1 > nu_global ; # degrees of freedom for the half -t priors for tau real < lower =1 > nu_local ; # degrees of freedom for the half - t priors for lambdas real < lower =0 > slab_scale ; # slab scale for the regularized horseshoe real < lower =0 > slab_df ; # slab degrees of freedom for the regularized horseshoe } transformed data { matrix[n, nX - 1] Xc; // centered version of X vector[nX - 1] means_X; // column means of X before centering for (i in 2:nX) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { real logsigma ; real cbeta0 ; vector [ nX-1] z; real < lower =0 > tau ; # global shrinkage parameter vector < lower =0 >[ nX-1] lambda ; # local shrinkage parameter real < lower =0 > caux ; } transformed parameters { real < lower =0 > sigma ; # noise std vector < lower =0 >[ nX-1] lambda_tilde ; # ’ truncated ’ local shrinkage parameter real < lower =0 > c; # slab scale vector [ nX-1] beta ; # regression coefficients vector [ n] mu; # latent function values sigma = exp ( logsigma ); c = slab_scale * sqrt ( caux ); lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) ); beta = z .* lambda_tilde * tau ; mu = cbeta0 + Xc* beta ; } model { # half -t priors for lambdas and tau , and inverse - gamma for c ^2 z ~ normal (0 , 1); lambda ~ student_t ( nu_local , 0, 1); tau ~ student_t ( nu_global , 0 , scale_global * sigma ); caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df ); cbeta0 ~ normal (0 , scale_icept ); Y ~ normal (mu , sigma ); } generated quantities { real beta0; // population-level intercept vector[n] log_lik; beta0 = cbeta0 - dot_product(means_X, beta); for (i in 1:n) { log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma); } }
X = model.matrix(~cx1 + cx2, data = data) data.list <- with(data, list(Y = y, X = X, nX = ncol(X), n = nrow(data), scale_icept = 100, scale_global = 1, nu_global = 1, nu_local = 1, slab_scale = 2, slab_df = 4)) data.rstan.sparsity <- stan(data = data.list, model_code = modelString, chains = 3, iter = 4000, warmup = 2000, thin = 3, save_dso = TRUE)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0, from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file4d7250064089.cpp:8: /usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined # define BOOST_NO_CXX11_RVALUE_REFERENCES ^ <command-line>:0:0: note: this is the location of the previous definition SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 1). Gradient evaluation took 3.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1200 / 4000 [ 30%] (Warmup) Iteration: 1600 / 4000 [ 40%] (Warmup) Iteration: 2000 / 4000 [ 50%] (Warmup) Iteration: 2001 / 4000 [ 50%] (Sampling) Iteration: 2400 / 4000 [ 60%] (Sampling) Iteration: 2800 / 4000 [ 70%] (Sampling) Iteration: 3200 / 4000 [ 80%] (Sampling) Iteration: 3600 / 4000 [ 90%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.421584 seconds (Warm-up) 0.418319 seconds (Sampling) 0.839903 seconds (Total) SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 2). Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1200 / 4000 [ 30%] (Warmup) Iteration: 1600 / 4000 [ 40%] (Warmup) Iteration: 2000 / 4000 [ 50%] (Warmup) Iteration: 2001 / 4000 [ 50%] (Sampling) Iteration: 2400 / 4000 [ 60%] (Sampling) Iteration: 2800 / 4000 [ 70%] (Sampling) Iteration: 3200 / 4000 [ 80%] (Sampling) Iteration: 3600 / 4000 [ 90%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.532437 seconds (Warm-up) 0.429048 seconds (Sampling) 0.961485 seconds (Total) SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 3). Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1200 / 4000 [ 30%] (Warmup) Iteration: 1600 / 4000 [ 40%] (Warmup) Iteration: 2000 / 4000 [ 50%] (Warmup) Iteration: 2001 / 4000 [ 50%] (Sampling) Iteration: 2400 / 4000 [ 60%] (Sampling) Iteration: 2800 / 4000 [ 70%] (Sampling) Iteration: 3200 / 4000 [ 80%] (Sampling) Iteration: 3600 / 4000 [ 90%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.434669 seconds (Warm-up) 0.388257 seconds (Sampling) 0.822926 seconds (Total)
tidyMCMC(data.rstan.sparsity, pars = c("beta[1]", "beta[2]"), conf.int = TRUE, conf.type = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 beta[1] 2.895646 0.5093930 1.9116555 3.920760 1.0001824 2001 2 beta[2] 1.375823 0.4562855 0.4740256 2.260056 0.9992755 1865
library(bayesplot) mcmc_areas(as.matrix(data.rstan.sparsity), regex_par = "beta")
Obviously, these data are not really appropriate for model selection as there are only two predictors. Both predictors have substantial effects mass larger than zero.
The RSTANARM implementation of horseshoe priors follows closely the recommendations of Piironen and Vehtari (2017)
.
In particular, the global scale parameter is provided as a ratio of expected number of non-zero coefficients to expected number of zero coefficients.
n = nrow(data) nX = 2 p0 = 1 global_scale = p0/(nX - p0)/sqrt(n) data.rstanarm.sparsity = stan_glm(y ~ cx1 + cx2, data = data, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = hs(df = 1, global_df = 1, global_scale = global_scale), prior_aux = cauchy(0, 2))
Gradient evaluation took 6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds. Adjust your expectations accordingly! Elapsed Time: 0.759242 seconds (Warm-up) 3.96208 seconds (Sampling) 4.72132 seconds (Total) Gradient evaluation took 2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds. Adjust your expectations accordingly! Elapsed Time: 0.656009 seconds (Warm-up) 3.67586 seconds (Sampling) 4.33187 seconds (Total) Gradient evaluation took 1.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds. Adjust your expectations accordingly! Elapsed Time: 0.743072 seconds (Warm-up) 2.80689 seconds (Sampling) 3.54996 seconds (Total)
print(data.rstanarm.sparsity)
stan_glm family: gaussian [identity] formula: y ~ cx1 + cx2 ------ Estimates: Median MAD_SD (Intercept) 3.8 0.1 cx1 3.0 0.6 cx2 1.2 0.5 sigma 1.1 0.1 Sample avg. posterior predictive distribution of y (X = xbar): Median MAD_SD mean_PPD 3.8 0.2 ------ For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data.rstanarm.sparsity$stanfit, conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 (Intercept) 3.824113 0.11437874 3.5946995 4.042791 0.9993167 2673 2 cx1 3.053599 0.55226650 1.9556994 4.081797 1.0007825 2423 3 cx2 1.209863 0.49005692 0.1433270 2.096745 1.0024060 2196 4 sigma 1.154752 0.08662372 0.9797404 1.310512 0.9995021 2389 5 mean_PPD 3.826130 0.16242530 3.4951206 4.121557 0.9996932 2652 6 log-posterior -179.717342 2.43657813 -184.8057338 -175.828604 1.0013223 1438
library(bayesplot) mcmc_areas(as.matrix(data.rstanarm.sparsity), regex_par = "cx")
Obviously, these data are not really appropriate for model selection as there are only two predictors. Both predictors have substantial effects mass larger than zero.
The BRMS implementation of horseshoe priors follows closely the recommendations of Piironen and Vehtari (2017)
.
In particular, the global scale parameter is provided as a ratio of expected number of non-zero coefficients to expected number of zero coefficients.
nX = 2 p0 = 1 par_ratio = p0/(nX - p0) data.brms.sparsity = brm(y ~ cx1 + cx2, data = data, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), prior(horseshoe(df = 1, par_ratio = par_ratio), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 3.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.37 seconds. Adjust your expectations accordingly! Elapsed Time: 0.116006 seconds (Warm-up) 0.635486 seconds (Sampling) 0.751492 seconds (Total) Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Elapsed Time: 0.122749 seconds (Warm-up) 0.606301 seconds (Sampling) 0.72905 seconds (Total) Gradient evaluation took 1.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds. Adjust your expectations accordingly! Elapsed Time: 0.12116 seconds (Warm-up) 0.50536 seconds (Sampling) 0.62652 seconds (Total)
print(data.brms.sparsity)
Family: gaussian(identity) Formula: y ~ cx1 + cx2 Data: data (Number of observations: 100) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 3.82 0.11 3.60 4.05 2419 1 cx1 2.96 0.53 1.95 4.03 2437 1 cx2 1.31 0.47 0.29 2.17 1815 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.15 0.08 1 1.33 2301 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data.brms.sparsity$fit, conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 b_Intercept 3.818970 0.11359657 3.6086274 4.052044 0.9995747 2419 2 b_cx1 2.958484 0.52869532 1.9568866 4.031718 0.9999373 2437 3 b_cx2 1.305360 0.46931653 0.3701380 2.200090 1.0011298 1815 4 sigma 1.152482 0.08273292 0.9900069 1.317621 1.0017566 2301 5 hs_c2 2.183763 2.35770828 0.2967781 6.056975 1.0020866 1340
library(bayesplot) mcmc_areas(as.matrix(data.brms.sparsity), regex_par = "cx")
Obviously, these data are not really appropriate for model selection as there are only two predictors. Both predictors have substantial effects mass larger than zero.
References
Gelman, A. (2005). “Analysis of Variance - Why it is More Important Than Ever”. In: The Annals of Statistics 33.1, pp. 1–53.
Gelman, A., B. Goodrich, J. Gabry, et al. (2017). “R-squared for Bayesian regression models”.
Piironen, J. and A. Vehtari (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. URL: http://arxiv.org/abs/1707.01694.
Vehtari, A, A. Gelman and J. Gabry (2016b). “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC”. In: Statistics and Computing.
Worked Examples
- Logan (2010) - Chpt 9
- Quinn & Keough (2002) - Chpt 6
Multiple Linear Regression
Paruelo & Lauenroth (1996) analyzed the geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFT's) including shrubs, forbs, succulents (e.g. cacti), C3 grasses and C4 grasses. They used data from 73 sites across temperate central North America (see pareulo.syd) and calculated the relative abundance of C3 grasses at each site as a response variable
Download Paruelo data setFormat of paruelo.csv data file | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
paruelo <- read.table("../downloads/data/paruelo.csv", header = T, sep = ",", strip.white = T) head(paruelo)
C3 LAT LONG MAP MAT JJAMAP DJFMAP 1 0.65 46.40 119.55 199 12.4 0.12 0.45 2 0.65 47.32 114.27 469 7.5 0.24 0.29 3 0.76 45.78 110.78 536 7.2 0.24 0.20 4 0.75 43.95 101.87 476 8.2 0.35 0.15 5 0.33 46.90 102.82 484 4.8 0.40 0.14 6 0.03 38.87 99.38 623 12.0 0.40 0.11
-
Perform exploratory data analysis to help guide what sort of analysis will be suitable
and whether the various assumptions are likely to be met.
# via car's scatterplotMatrix function library(car) scatterplotMatrix(~C3 + LAT + LONG + MAP + MAT + JJAMAP + DJFMAP, data = paruelo, diagonal = "boxplot")
# via lattice library(lattice) splom.lat <- splom(paruelo, type = c("p", "r")) print(splom.lat)
# via ggplot2 - warning these are slow! library(GGally) ggpairs(paruelo, lower = list(continuous = "smooth"), diag = list(continuous = "density"), axisLabels = "none")
# splom.gg <- plotmatrix(paruelo)+geom_smooth(method='lm') # print(splom.gg)
-
C3 abundance is clearly non-normal. Since C3 abundance is relative abundance (which logically must range from 0 to 1),
arguably, the most appropriate approach would be to model these data with a binomial (or perhaps beta) distribution.
Indeed, this is the approach that we will take in Tutorial 10.4 and Tutorial 10.5a
A more simplistic approach that can be applied within simple OLS regression, is to attempt to normalize the response variable via a scale transformation.
Since the C3 relative abundances have values of zero, the authors elected to perform a square-root transformation. Generally speaking, this can be a very dangerous course of action if back-transformations from the fitted model are required due to the nature of squaring sets of numbers that are a mixture of negatives and positives or even less than 1 and greater than 1.
This example therefore potentially serves as a good example of the dangers of root transformations. Try applying a temporary square root transformation (HINT). Does this improve some of these specific assumptions (y or n)?
Whilst in many model fitting and graphing routines are able to perform transformation inline, for more complex examples, it is often advisable to also create transformed versions of variables.
# via car's scatterplotMatrix function library(car) scatterplotMatrix(~sqrt(C3) + LAT + LONG + MAP + MAT + JJAMAP + log10(DJFMAP), data = paruelo, diagonal = "boxplot")
# via ggplot2 - warning these are slow! library(GGally) library(dplyr) paruelo = paruelo %>% mutate(sqrtC3 = sqrt(C3), lDJFMAP = log10(DJFMAP)) paruelo %>% dplyr:::select(sqrtC3, LAT, LONG, MAP, MAT, lDJFMAP) %>% ggpairs(lower = list(continuous = "smooth"), diag = list(continuous = "density"), axisLabels = "none")
- The scatterplot matrices suggest that some of the predictors might be correlated to one another.
Although the above diagnostics are useful at identifying potential (Multi)collinearity issues, they do not examine collinearity
directly. (Multi)collinearity can more be diagnosed directly via
tolerance and variance inflation factor (VIF) measures.
- Calculate the VIF values for each of the predictor variables (note, this is typically done in a frequentist framework to save time HINT).
- Calculate the tolerance values for each of the predictor variables (HINT).
library(car) vif(lm(sqrt(C3) ~ LAT + LONG + MAP + MAT + JJAMAP + log10(DJFMAP), data = paruelo))
LAT LONG MAP MAT JJAMAP log10(DJFMAP) 3.560103 4.988318 2.794157 3.752353 3.194724 5.467330
library(car) 1/vif(lm(sqrt(C3) ~ LAT + LONG + MAP + MAT + JJAMAP + log10(DJFMAP), data = paruelo))
LAT LONG MAP MAT JJAMAP log10(DJFMAP) 0.2808908 0.2004684 0.3578897 0.2664995 0.3130161 0.1829046
- Obviously, this model will violate collinearity.
It is highly likely that LAT and LONG will be related to the LAT:LONG interaction term. It turns out that if we center the variables, then the individual terms will no longer be correlated to the interaction. Center the LAT and LONG variables (
HINT) and
(HINT)
paruelo = paruelo %>% mutate(cLAT = as.vector(scale(paruelo$LAT, scale = F)), cLONG = as.vector(scale(paruelo$LONG, scale = F))) mean.LAT = mean(paruelo$LAT) mean.LONG = mean(paruelo$LONG)
-
Fit the appropriate Bayesian model.
library(MCMCpack) paruelo.mcmcpack = MCMCregress(sqrt(C3) ~ cLAT * cLONG, data = paruelo)
modelString = " model { #Likelihood for (i in 1:n) { y[i]~dnorm(mu[i],tau) mu[i] <- beta0 + inprod(beta[],X[i,]) } #Priors beta0 ~ dnorm(0.01,1.0E-6) for (j in 1:nX) { beta[j] ~ dnorm(0.01,1.0E-6) } tau <- 1 / (sigma * sigma) sigma~dunif(0,100) } " X = model.matrix(~cLAT * cLONG, data = paruelo) paruelo.list <- with(paruelo, list(y = sqrt(C3), X = X[, -1], nX = ncol(X) - 1, n = nrow(paruelo))) params <- c("beta0", "beta", "sigma") burnInSteps = 3000 nChains = 3 numSavedSteps = 15000 thinSteps = 10 nIter = ceiling((numSavedSteps * thinSteps)/nChains) paruelo.r2jags <- jags(data = paruelo.list, inits = NULL, parameters.to.save = params, model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter, n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 73 Unobserved stochastic nodes: 5 Total graph size: 530 Initializing model
modelString=" data { int
n; // total number of observations vector[n] Y; // response variable int nX; // number of effects matrix[n, nX] X; // model matrix } transformed data { matrix[n, nX - 1] Xc; // centered version of X vector[nX - 1] means_X; // column means of X before centering for (i in 2:nX) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { vector[nX-1] beta; // population-level effects real cbeta0; // center-scale intercept real sigma; // residual SD } transformed parameters { } model { vector[n] mu; mu = Xc * beta + cbeta0; // prior specifications beta ~ normal(0, 10); cbeta0 ~ normal(0, 10); sigma ~ cauchy(0, 5); // likelihood contribution Y ~ normal(mu, sigma); } generated quantities { real beta0; // population-level intercept vector[n] log_lik; beta0 = cbeta0 - dot_product(means_X, beta); for (i in 1:n) { log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma); } } " X = model.matrix(~cLAT * cLONG, data = paruelo) paruelo.list <- with(paruelo, list(Y = sqrt(C3), X = X, nX = ncol(X), n = nrow(paruelo))) library(rstan) paruelo.rstan <- stan(data = paruelo.list, model_code = modelString, chains = 3, iter = 5000, warmup = 500, thin = 2)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0, from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file48e38d66a6e.cpp:8: /usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined # define BOOST_NO_CXX11_RVALUE_REFERENCES ^ <command-line>:0:0: note: this is the location of the previous definition SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1). Gradient evaluation took 2.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.139951 seconds (Warm-up) 0.643681 seconds (Sampling) 0.783632 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2). Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.108938 seconds (Warm-up) 0.497058 seconds (Sampling) 0.605996 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3). Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.099245 seconds (Warm-up) 0.667992 seconds (Sampling) 0.767237 seconds (Total)
paruelo.rstanarm = stan_glm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, iter = 5000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 10), prior = normal(0, 10), prior_aux = cauchy(0, 5))
Gradient evaluation took 3.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds. Adjust your expectations accordingly! Elapsed Time: 0.043872 seconds (Warm-up) 0.263119 seconds (Sampling) 0.306991 seconds (Total) Gradient evaluation took 1.3e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds. Adjust your expectations accordingly! Elapsed Time: 0.042429 seconds (Warm-up) 0.251179 seconds (Sampling) 0.293608 seconds (Total) Gradient evaluation took 1.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds. Adjust your expectations accordingly! Elapsed Time: 0.045349 seconds (Warm-up) 0.273638 seconds (Sampling) 0.318987 seconds (Total)
paruelo.brm = brm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, iter = 5000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 10), class = "Intercept"), prior(normal(0, 10), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 3.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds. Adjust your expectations accordingly! Elapsed Time: 0.097905 seconds (Warm-up) 0.553687 seconds (Sampling) 0.651592 seconds (Total) Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Elapsed Time: 0.09545 seconds (Warm-up) 0.488293 seconds (Sampling) 0.583743 seconds (Total) Gradient evaluation took 1.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds. Adjust your expectations accordingly! Elapsed Time: 0.099694 seconds (Warm-up) 0.486619 seconds (Sampling) 0.586313 seconds (Total)
- Explore MCMC diagnostics
library(MCMCpack) plot(paruelo.mcmcpack)
raftery.diag(paruelo.mcmcpack)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) (Intercept) 2 3834 3746 1.020 cLAT 2 3865 3746 1.030 cLONG 2 3741 3746 0.999 cLAT:cLONG 2 3741 3746 0.999 sigma2 2 3711 3746 0.991
autocorr.diag(paruelo.mcmcpack)
(Intercept) cLAT cLONG cLAT:cLONG sigma2 Lag 0 1.000000000 1.000000000 1.000000000 1.0000000000 1.0000000000 Lag 1 0.004983757 0.003311830 -0.022298086 -0.0095335071 0.0453269793 Lag 5 0.004865672 -0.003696004 -0.010582477 -0.0091698711 -0.0007370183 Lag 10 -0.003554399 -0.006906004 -0.008250076 0.0070361094 0.0060836234 Lag 50 0.006350222 0.002874780 0.003297339 -0.0005618509 0.0095939327
library(R2jags) library(coda) paruelo.mcmc = as.mcmc(paruelo.r2jags) plot(paruelo.mcmc)
raftery.diag(paruelo.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 39680 3746 10.6 beta[1] 20 39000 3746 10.4 beta[2] 10 37660 3746 10.1 beta[3] 10 37660 3746 10.1 deviance 10 37660 3746 10.1 sigma 20 38330 3746 10.2 [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 10 37660 3746 10.10 beta[1] 20 37020 3746 9.88 beta[2] 10 37660 3746 10.10 beta[3] 20 39680 3746 10.60 deviance 20 39680 3746 10.60 sigma 10 37670 3746 10.10 [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 36380 3746 9.71 beta[1] 20 37020 3746 9.88 beta[2] 10 37660 3746 10.10 beta[3] 20 38330 3746 10.20 deviance 20 38330 3746 10.20 sigma 10 37660 3746 10.10
autocorr.diag(paruelo.mcmc)
beta0 beta[1] beta[2] beta[3] deviance sigma Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 Lag 10 0.010786700 0.005332811 -0.009595787 0.002435309 0.009380278 0.006592760 Lag 50 0.004340032 -0.002413859 -0.008135064 -0.005542328 -0.003773973 -0.002048602 Lag 100 0.003071248 -0.003459770 -0.005733956 0.007969559 0.004984708 0.006517454 Lag 500 -0.016464095 -0.017355263 0.004811424 -0.011833396 -0.004070388 0.017952813
library(rstan) library(coda) s = as.array(paruelo.rstan) paruelo.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("beta0", "beta[1]", "beta[2]", "sigma")], 2, as.mcmc)) plot(paruelo.mcmc)
raftery.diag(paruelo.mcmc)
$`1` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`2` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`3` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(paruelo.mcmc)
beta0 beta[1] beta[2] sigma Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 Lag 1 0.115904479 0.006334995 0.007386692 0.112083619 Lag 5 -0.039772609 0.006114161 -0.012930676 -0.015401901 Lag 10 0.009567958 -0.004416943 -0.001078328 0.001545453 Lag 50 0.024857904 0.027040761 -0.004307904 -0.026240662
library(rstan) library(coda) stan_ac(paruelo.rstan, pars = c("beta", "sigma"))
stan_rhat(paruelo.rstan, pars = c("beta", "sigma"))
stan_ess(paruelo.rstan, pars = c("beta", "sigma"))
# using Bayeseplot library(bayesplot) detach("package:reshape") mcmc_trace(as.array(paruelo.rstan), regex_par = "beta|sigma")
mcmc_trace(as.array(paruelo.rstan), regex_pars = "beta|sigma")
mcmc_dens(as.array(paruelo.rstan), regex_par = "beta|sigma")
detach("package:reshape") library(bayesplot) mcmc_combo(as.array(paruelo.rstan), regex_par = "beta|sigma")
library(rstanarm) library(coda) s = as.array(paruelo.rstanarm) paruelo.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG", "sigma")], 2, as.mcmc)) plot(paruelo.mcmc)
raftery.diag(paruelo.mcmc)
$`1` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`2` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`3` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(paruelo.mcmc)
(Intercept) cLAT cLONG cLAT:cLONG sigma Lag 0 1.000000000 1.00000000 1.000000000 1.00000000 1.000000000 Lag 1 0.017538076 0.03784572 0.039597169 0.01947417 0.013387926 Lag 5 -0.022630503 -0.01672235 -0.006182728 -0.03032264 -0.002027127 Lag 10 0.031736657 0.01612416 0.015318241 -0.01570669 -0.022527541 Lag 50 -0.005001375 0.02067267 -0.015023796 -0.02282986 0.008570300
library(rstanarm) library(coda) stan_ac(paruelo.rstanarm, pars = c("Intercept", "cL", "sigma"))
Error in data.frame(value = unlist(x[[i]], use.names = FALSE), parameter = rep(names(x[[i]]), : arguments imply differing number of rows: 2249, 0, 1
stan_rhat(paruelo.rstanarm, pars = c("Intercept", "cL", "sigma"))
Error in check_pars(allpars, pars): no parameter Intercept, cL
stan_ess(paruelo.rstanarm, pars = c("Intercept", "cL", "sigma"))
Error in check_pars(allpars, pars): no parameter Intercept, cL
# using Bayeseplot library(bayesplot) detach("package:reshape") mcmc_trace(as.array(paruelo.rstanarm), regex_par = "Intercept|cL|sigma")
mcmc_trace(as.array(paruelo.rstanarm), regex_pars = "Intercept|cL|sigma")
mcmc_dens(as.array(paruelo.rstanarm), regex_pars = "Intercept|cL|sigma")
detach("package:reshape") library(bayesplot) mcmc_combo(as.array(paruelo.rstanarm), regex_par = "Intercept|cL|sigma")
library(coda) library(brms) paruelo.mcmc = as.mcmc(paruelo.brm) plot(paruelo.mcmc)
raftery.diag(paruelo.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(paruelo.mcmc)
Error in ts(x, start = start(x), end = end(x), deltat = thin(x)): invalid time series parameters specified
library(coda) stan_ac(paruelo.brm$fit)
stan_rhat(paruelo.brm$fit)
stan_ess(paruelo.brm$fit)
- Perform model validation
library(MCMCpack) paruelo.mcmc = as.data.frame(paruelo.mcmcpack) # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
library(MCMCpack) paruelo.mcmc = as.data.frame(paruelo.mcmcpack) # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cLAT:cLONG) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
library(MCMCpack) paruelo.mcmc = as.data.frame(paruelo.mcmcpack) # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
library(MCMCpack) paruelo.mcmc = as.data.frame(paruelo.mcmcpack) # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = as.matrix(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]) fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i, ], sqrt(paruelo.mcmc[i, "sigma2"]))) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"), alpha = 0.5)
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(paruelo.mcmcpack), regex_pars = "cL")
mcmc_areas(as.matrix(paruelo.mcmcpack), regex_pars = "cL")
library(R2jags) paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cLAT:cLONG) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = as.matrix(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]) fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i, ], paruelo.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"), alpha = 0.5)
Lets see how well data simulated from the model reflects the raw data
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix coefs = paruelo.mcmc[, 1:4] newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT), max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG), max(cLONG), len = 100)))) Xmat = model.matrix(~cLAT * cLONG, data = newdata) fit = coefs %*% t(Xmat) # add noise for prediction instead of confidence fit = t(sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(newdata), fit[i, ], paruelo.mcmc[i, "sigma"]))) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cLAT:cLONG) %>% filter(Value != 0) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value", cLAT:cLONG) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt, aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
We can also explore the posteriors of each parameter.
library(bayesplot) paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix mcmc_intervals(paruelo.mcmc, regex_pars = "beta")
mcmc_areas(paruelo.mcmc, regex_pars = "beta")
paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cLAT:cLONG) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = sqrt(paruelo$C3) - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = as.matrix(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]) fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i, ], paruelo.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"), alpha = 0.5)
Lets see how well data simulated from the model reflects the raw data
paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix coefs = paruelo.mcmc[, 1:4] newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT), max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG), max(cLONG), len = 100)))) Xmat = model.matrix(~cLAT * cLONG, data = newdata) fit = coefs %*% t(Xmat) # add noise for prediction instead of confidence fit = t(sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(newdata), fit[i, ], paruelo.mcmc[i, "sigma"]))) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cLAT:cLONG) %>% filter(Value != 0) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value", cLAT:cLONG) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt, aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
And on a natural scale (back-transformed)
newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT), max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG), max(cLONG), len = 100)))) fit = fit^2 newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cLAT:cLONG) %>% filter(Value != 0) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value", cLAT:cLONG) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt, aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
We can also explore the posteriors of each parameter.
library(bayesplot) paruelo.mcmc = as.matrix(paruelo.rstan) mcmc_intervals(paruelo.mcmc, regex_pars = "beta\\[")
mcmc_areas(paruelo.mcmc, regex_pars = "beta\\[")
resid = resid(paruelo.rstanarm) fit = fitted(paruelo.rstanarm) ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
resid = resid(paruelo.rstanarm) paruelo.melt = paruelo %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cLAT:cLONG) ggplot(paruelo.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
resid = resid(paruelo.rstanarm) sresid = resid/sd(resid) fit = fitted(paruelo.rstanarm) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
y_pred = posterior_predict(paruelo.rstanarm) newdata = paruelo %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -C3:-cLONG) newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG) ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) + geom_violin(data = paruelo.melt, aes(y = sqrt(C3), x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred)
paruelo.mcmc = as.data.frame(paruelo.rstanarm) %>% dplyr:::select(matches("Inter"), starts_with("cL"), sigma) %>% as.matrix coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i, ], paruelo.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"), alpha = 0.5)
Lets see how well data simulated from the model reflects the raw data
newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT), max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG), max(cLONG), len = 100)))) fit = posterior_predict(paruelo.rstanarm, newdata = newdata) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cLAT:cLONG) %>% filter(Value != 0) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value", cLAT:cLONG) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt, aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
And on a natural scale (back-transformed)
newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT), max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG), max(cLONG), len = 100)))) fit = posterior_predict(paruelo.rstanarm, newdata = newdata)^2 newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cLAT:cLONG) %>% filter(Value != 0) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value", cLAT:cLONG) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt, aes(y = C3)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
We can also explore the posteriors of each parameter.
library(bayesplot) paruelo.mcmc = as.matrix(paruelo.rstanarm) mcmc_intervals(paruelo.mcmc, regex_pars = "cL")
mcmc_areas(paruelo.mcmc, regex_pars = "cL")
resid = resid(paruelo.brm)[, "Estimate"] fit = fitted(paruelo.brm)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
resid = resid(paruelo.brm)[, "Estimate"] paruelo.melt = paruelo %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cLAT:cLONG) ggplot(paruelo.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
resid = resid(paruelo.brm)[, "Estimate"] sresid = resid/sd(resid) fit = fitted(paruelo.brm)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
y_pred = posterior_predict(paruelo.brm) newdata = paruelo %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -C3:-cLONG) newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG) ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) + geom_violin(data = paruelo.melt, aes(y = sqrt(C3), x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred)
paruelo.mcmc = as.data.frame(paruelo.brm) %>% dplyr:::select(matches("Inter"), starts_with("b_"), sigma) %>% as.matrix coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
Error in paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]: subscript out of bounds
# generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i, ], paruelo.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"), alpha = 0.5)
Lets see how well data simulated from the model reflects the raw data
newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT), max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG), max(cLONG), len = 100)))) fit = posterior_predict(paruelo.brm, newdata = newdata) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cLAT:cLONG) %>% filter(Value != 0) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value", cLAT:cLONG) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt, aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
And on a natural scale (back-transformed)
newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT), max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG), max(cLONG), len = 100)))) fit = posterior_predict(paruelo.brm, newdata = newdata)^2 newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cLAT:cLONG) %>% filter(Value != 0) paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value", cLAT:cLONG) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt, aes(y = C3)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
We can also explore the posteriors of each parameter.
library(bayesplot) paruelo.mcmc = as.matrix(paruelo.brm) mcmc_intervals(paruelo.mcmc, regex_pars = "cL")
mcmc_areas(paruelo.mcmc, regex_pars = "cL")
- Explore parameter estimates
library(MCMCpack) summary(paruelo.mcmcpack)
Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 0.428262 0.0238363 2.384e-04 2.384e-04 cLAT 0.043690 0.0049880 4.988e-05 4.988e-05 cLONG -0.002911 0.0036879 3.688e-05 3.607e-05 cLAT:cLONG 0.002282 0.0007591 7.591e-06 7.591e-06 sigma2 0.040774 0.0071573 7.157e-05 7.416e-05 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 0.3823311 0.412255 0.428315 0.4438954 0.476045 cLAT 0.0339328 0.040366 0.043678 0.0469825 0.053461 cLONG -0.0101961 -0.005383 -0.002868 -0.0004278 0.004209 cLAT:cLONG 0.0008195 0.001773 0.002275 0.0027832 0.003791 sigma2 0.0291154 0.035663 0.039970 0.0450441 0.056792
library(broom) tidyMCMC(paruelo.mcmcpack, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 0.428262064 0.0238363306 0.3812121953 0.474471951 2 cLAT 0.043690358 0.0049880391 0.0337928327 0.053291476 3 cLONG -0.002910852 0.0036879275 -0.0103440322 0.003966760 4 cLAT:cLONG 0.002282089 0.0007591007 0.0008117709 0.003776196 5 sigma2 0.040774221 0.0071572530 0.0278700035 0.054891530
mcmcpvalue(paruelo.mcmcpack[, "cLAT"])
[1] 0
mcmcpvalue(paruelo.mcmcpack[, "cLONG"])
[1] 0.4271
mcmcpvalue(paruelo.mcmcpack[, "cLAT:cLONG"])
[1] 0.0037
print(paruelo.r2jags)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 50000 iterations (first 3000 discarded), n.thin = 10 n.sims = 14100 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 0.044 0.005 0.034 0.040 0.044 0.047 0.054 1.001 14000 beta[2] -0.003 0.004 -0.010 -0.005 -0.003 0.000 0.005 1.001 14000 beta[3] 0.002 0.001 0.001 0.002 0.002 0.003 0.004 1.001 5800 beta0 0.428 0.024 0.381 0.412 0.428 0.445 0.476 1.001 14000 sigma 0.203 0.018 0.172 0.190 0.201 0.214 0.241 1.001 8500 deviance -27.267 3.349 -31.694 -29.726 -27.961 -25.557 -19.074 1.001 13000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 5.6 and DIC = -21.7 DIC is an estimate of expected predictive error (lower deviance is better).
library(broom) tidyMCMC(as.mcmc(paruelo.r2jags), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 beta0 0.428244156 0.0241248235 3.828510e-01 0.477842988 2 beta[1] 0.043754800 0.0049544131 3.357956e-02 0.053028339 3 beta[2] -0.002930712 0.0037809737 -1.045071e-02 0.004455156 4 beta[3] 0.002282551 0.0007658955 7.668375e-04 0.003760403 5 deviance -27.267374046 3.3494006696 -3.232140e+01 -20.820160276 6 sigma 0.202889499 0.0177803063 1.708334e-01 0.239513349
mcmcpvalue(paruelo.r2jags$BUGSoutput$sims.matrix[, "beta[1]"])
[1] 0
mcmcpvalue(paruelo.r2jags$BUGSoutput$sims.matrix[, "beta[2]"])
[1] 0.431844
mcmcpvalue(paruelo.r2jags$BUGSoutput$sims.matrix[, "beta[3]"])
[1] 0.003262411
print(paruelo.rstan, pars = c("beta0", "beta", "sigma"))
Inference for Stan model: d98dbf6a02725fc3fce11306b77873e9. 3 chains, each with iter=5000; warmup=500; thin=2; post-warmup draws per chain=2250, total post-warmup draws=6750. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta0 0.43 0 0.02 0.38 0.41 0.43 0.44 0.48 5240 1 beta[1] 0.04 0 0.00 0.03 0.04 0.04 0.05 0.05 6527 1 beta[2] 0.00 0 0.00 -0.01 -0.01 0.00 0.00 0.00 6349 1 beta[3] 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00 6288 1 sigma 0.20 0 0.02 0.17 0.19 0.20 0.21 0.24 5209 1 Samples were drawn using NUTS(diag_e) at Mon Aug 21 16:38:35 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
library(broom) tidyMCMC(paruelo.rstan, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta0", "beta", "sigma"), ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 beta0 0.427836523 0.0240303191 0.3819939833 0.475662944 1.0010406 5240 2 beta[1] 0.043632430 0.0049722257 0.0332021094 0.052911579 1.0001943 6527 3 beta[2] -0.002863611 0.0037983372 -0.0099487399 0.004980100 0.9997762 6349 4 beta[3] 0.002294422 0.0007688886 0.0008689837 0.003920697 0.9998934 6288 5 sigma 0.203118147 0.0177639850 0.1699683473 0.238532986 0.9998482 5209
mcmcpvalue(as.matrix(paruelo.rstan)[, "beta[1]"])
[1] 0
mcmcpvalue(as.matrix(paruelo.rstan)[, "beta[2]"])
[1] 0.4474074
mcmcpvalue(as.matrix(paruelo.rstan)[, "beta[3]"])
[1] 0.00237037
# lets explore the support for the interaction via loo library(loo) (full = loo(extract_log_lik(paruelo.rstan)))
Computed from 6750 by 73 log-likelihood matrix Estimate SE elpd_loo 11.7 5.2 p_loo 3.6 0.6 looic -23.4 10.4 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
X = model.matrix(~cLAT + cLONG, data = paruelo) paruelo.list <- with(paruelo, list(Y = sqrt(C3), X = X, nX = ncol(X), n = nrow(paruelo))) paruelo.rstan.red <- stan(data = paruelo.list, model_code = modelString, chains = 3, iter = 4000, warmup = 1000, thin = 3, save_dso = TRUE)
SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1). Gradient evaluation took 2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.045721 seconds (Warm-up) 0.109229 seconds (Sampling) 0.15495 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2). Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.04781 seconds (Warm-up) 0.124076 seconds (Sampling) 0.171886 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3). Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1001 / 4000 [ 25%] (Sampling) Iteration: 1400 / 4000 [ 35%] (Sampling) Iteration: 1800 / 4000 [ 45%] (Sampling) Iteration: 2200 / 4000 [ 55%] (Sampling) Iteration: 2600 / 4000 [ 65%] (Sampling) Iteration: 3000 / 4000 [ 75%] (Sampling) Iteration: 3400 / 4000 [ 85%] (Sampling) Iteration: 3800 / 4000 [ 95%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 0.055341 seconds (Warm-up) 0.117843 seconds (Sampling) 0.173184 seconds (Total)
(reduced = loo(extract_log_lik(paruelo.rstan.red)))
Computed from 3000 by 73 log-likelihood matrix Estimate SE elpd_loo 7.8 4.3 p_loo 3.2 0.5 looic -15.7 8.5 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
summary(paruelo.rstanarm)
Model Info: function: stan_glm family: gaussian [identity] formula: sqrt(C3) ~ cLAT * cLONG algorithm: sampling priors: see help('prior_summary') sample: 6750 (posterior sample size) num obs: 73 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 0.4 0.0 0.4 0.4 0.4 0.4 0.5 cLAT 0.0 0.0 0.0 0.0 0.0 0.0 0.1 cLONG 0.0 0.0 0.0 0.0 0.0 0.0 0.0 cLAT:cLONG 0.0 0.0 0.0 0.0 0.0 0.0 0.0 sigma 0.2 0.0 0.2 0.2 0.2 0.2 0.2 mean_PPD 0.4 0.0 0.4 0.4 0.4 0.5 0.5 log-posterior 5.8 1.6 1.7 5.0 6.1 7.0 8.0 Diagnostics: mcse Rhat n_eff (Intercept) 0.0 1.0 6520 cLAT 0.0 1.0 6276 cLONG 0.0 1.0 6208 cLAT:cLONG 0.0 1.0 6187 sigma 0.0 1.0 6511 mean_PPD 0.0 1.0 6514 log-posterior 0.0 1.0 5064 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
library(broom) tidyMCMC(paruelo.rstanarm$stanfit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 (Intercept) 0.428725348 0.0237030588 0.3817818425 0.474550972 1.0001020 6520 2 cLAT 0.043720845 0.0049584396 0.0339531664 0.053464776 0.9999700 6276 3 cLONG -0.002864137 0.0037665520 -0.0101579021 0.004520961 0.9999069 6208 4 cLAT:cLONG 0.002286517 0.0007621962 0.0007272455 0.003698621 1.0002028 6187 5 sigma 0.202789016 0.0174313569 0.1701918889 0.236954391 1.0001987 6511 6 mean_PPD 0.436065205 0.0334535901 0.3707492034 0.502129845 0.9998652 6514 7 log-posterior 5.799906303 1.6257986198 2.5983796407 8.199551919 1.0001926 5064
mcmcpvalue(as.matrix(paruelo.rstanarm)[, "cLAT"])
[1] 0
mcmcpvalue(as.matrix(paruelo.rstanarm)[, "cLONG"])
[1] 0.4432593
mcmcpvalue(as.matrix(paruelo.rstanarm)[, "cLAT:cLONG"])
[1] 0.004444444
# lets explore the support for the interaction via loo library(loo) (full = loo(paruelo.rstanarm))
Computed from 6750 by 73 log-likelihood matrix Estimate SE elpd_loo 11.8 5.2 p_loo 3.6 0.6 looic -23.5 10.4 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
paruelo.rstanarm.red = update(paruelo.rstanarm, . ~ cLAT + cLONG)
Gradient evaluation took 3.3e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds. Adjust your expectations accordingly! Elapsed Time: 0.036207 seconds (Warm-up) 0.244547 seconds (Sampling) 0.280754 seconds (Total) Gradient evaluation took 1.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. Adjust your expectations accordingly! Elapsed Time: 0.037839 seconds (Warm-up) 0.244601 seconds (Sampling) 0.28244 seconds (Total) Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Elapsed Time: 0.038992 seconds (Warm-up) 0.24547 seconds (Sampling) 0.284462 seconds (Total)
(reduced = loo(paruelo.rstanarm.red))
Computed from 6750 by 73 log-likelihood matrix Estimate SE elpd_loo 7.9 4.3 p_loo 3.2 0.5 looic -15.9 8.6 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
elpd_diff se -3.8 2.4
summary(paruelo.brm)
Family: gaussian(identity) Formula: sqrt(C3) ~ cLAT * cLONG Data: paruelo (Number of observations: 73) Samples: 3 chains, each with iter = 5000; warmup = 500; thin = 2; total post-warmup samples = 6750 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 0.43 0.02 0.38 0.47 5783 1 cLAT 0.04 0.00 0.03 0.05 6366 1 cLONG 0.00 0.00 -0.01 0.00 6661 1 cLAT:cLONG 0.00 0.00 0.00 0.00 6750 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 0.2 0.02 0.17 0.24 5323 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
library(broom) tidyMCMC(paruelo.brm$fit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 b_Intercept 0.428165324 0.0240007939 0.3796997054 0.473389433 1.0000025 5783 2 b_cLAT 0.043616675 0.0049670799 0.0338118263 0.053212235 0.9997456 6366 3 b_cLONG -0.002874484 0.0037617204 -0.0102170990 0.004536065 0.9999475 6661 4 b_cLAT:cLONG 0.002266867 0.0007727372 0.0007333648 0.003811004 0.9997885 6750 5 sigma 0.202951074 0.0174078684 0.1699554940 0.237294437 1.0002295 5323
mcmcpvalue(as.matrix(paruelo.brm)[, "b_cLAT"])
[1] 0
mcmcpvalue(as.matrix(paruelo.brm)[, "b_cLONG"])
[1] 0.4368889
mcmcpvalue(as.matrix(paruelo.brm)[, "b_cLAT:cLONG"])
[1] 0.004296296
# lets explore the support for the interaction via loo library(loo) (full = loo(paruelo.brm))
LOOIC SE -23.5 10.4
paruelo.brm.red = update(paruelo.brm, . ~ cLAT + cLONG)
SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1). Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.087868 seconds (Warm-up) 0.083064 seconds (Sampling) 0.170932 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2). Gradient evaluation took 1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.083951 seconds (Warm-up) 0.080144 seconds (Sampling) 0.164095 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3). Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.10254 seconds (Warm-up) 0.092983 seconds (Sampling) 0.195523 seconds (Total)
(reduced = loo(paruelo.brm.red))
LOOIC SE -15.73 8.56
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
- Generate graphical summaries
library(MCMCpack) paruelo.mcmc = paruelo.mcmcpack ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*% -2:2)) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT", "LONG")]) rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed, nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
# Note, the curvature is purely an artifact of the transformation # applied.
paruelo.mcmc = paruelo.mcmcpack ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3", colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) + scale_y_continuous("Latitude") + scale_x_continuous("Longitude") + theme_classic()
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*% -2:2)) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT", "LONG")]) rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed, nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
# Note, the curvature is purely an artifact of the transformation # applied.
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3", colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) + scale_y_continuous("Latitude") + scale_x_continuous("Longitude") + theme_classic()
paruelo.mcmc = as.matrix(paruelo.rstan) ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*% -2:2)) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT", "LONG")]) rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed, nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
# Note, the curvature is purely an artifact of the transformation # applied.
paruelo.mcmc = as.matrix(paruelo.rstan) ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3", colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) + scale_y_continuous("Latitude") + scale_x_continuous("Longitude") + theme_classic()
paruelo.mcmc = as.matrix(paruelo.rstanarm) ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*% -2:2)) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT", "LONG")]) rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed, nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
# Note, the curvature is purely an artifact of the transformation # applied.
paruelo.mcmc = as.matrix(paruelo.rstanarm) ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3", colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) + scale_y_continuous("Latitude") + scale_x_continuous("Longitude") + theme_classic()
plot(marginal_effects(paruelo.brm, effects = "cLAT:cLONG"), points = TRUE)
# OR Define a function that will calculate mean plus or minus 2 and 1 # standard deviations msd2 = function(x) { means = mean(x, na.rm = TRUE) sd = sd(x, na.rm = TRUE) means + (-2:2) * sd } plot(marginal_effects(paruelo.brm, effects = "cLAT:cLONG", int_conditions = list(cLONG = msd2)), points = TRUE)
# OR we could arrange the effect of cLAT separately for different # values of cLONG (mean plus or minus 1 and 2 standard deviations) cond = data.frame(cLONG = msd2(paruelo$cLONG), row.names = paste0("cLONG: mean ", -2:2, "*sd")) plot(marginal_effects(paruelo.brm, effects = "cLAT", conditions = cond, select_points = 0.1), points = TRUE)
## Yet another way would be as a 2D surface plot(marginal_effects(paruelo.brm, effects = "cLAT:cLONG", surface = TRUE), points = TRUE, stype = "raster")
paruelo.mcmc = as.matrix(paruelo.brm) ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*% -2:2)) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("b_Intercept)", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
Error in paruelo.mcmc[, c("b_Intercept)", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]: subscript out of bounds
fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ## Partition the partial residuals such that each x1 trend only includes ## x2 data that is within that range in the observed data findNearest = function(x, y) { ff = fields:::rdist(x, y) apply(ff, 1, function(x) which(x == min(x))) } fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT", "LONG")]) rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1, 2), "*sigma"))) ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") + scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed, nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
# Note, the curvature is purely an artifact of the transformation # applied.
paruelo.mcmc = as.matrix(paruelo.brm) ## Calculate the fitted values newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE), max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE), max(cLONG, na.rm = TRUE), len = 100))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>% cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) + sd(cLONG) * -2:2)) fMat = rMat = model.matrix(~cLAT * cLONG, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2) rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) + geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3", colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) + scale_y_continuous("Latitude") + scale_x_continuous("Longitude") + theme_classic()
- Explore effect sizes - change in C3 associated with a change in Latitude from 35 to 45 at various levels of Longitude.
library(MCMCpack) paruelo.mcmc = paruelo.mcmcpack newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT), cLONG = (-2:2) * sd(cLONG))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = (coefs %*% t(Xmat))^2 s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.1342806 0.08713327 -0.03361161 0.3080649 2 4 0.2579431 0.05349590 0.15235971 0.3620157 3 6 0.3700630 0.04574632 0.27681216 0.4559970 4 8 0.4706405 0.07515740 0.32157620 0.6152214 5 10 0.5596755 0.12343050 0.32232714 0.8075261
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma2"]) (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.672703 0.4352213 -0.1968762 1.500608 2 4 1.291933 0.2878937 0.7507301 1.880575 3 6 1.853297 0.2771987 1.2898298 2.375413 4 8 2.356796 0.4226718 1.5261221 3.175460 5 10 2.802429 0.6547949 1.5759322 4.114309
# Percentage change ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.040822e+02 9.068098e+01 -36.33409 270.4147 2 4 3.100988e+02 1.254237e+02 107.26130 558.3306 3 6 9.916857e+02 5.155349e+02 313.55369 1894.5857 4 8 4.620501e+06 2.972336e+08 240.87719 78225.0598 5 10 1.294870e+10 1.289270e+12 174.88995 2118666.8085
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.7152 0.9999 1.0000 1.0000 1.0000
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 2.040822e+00 9.068098e-01 0.6366591 3.704147 2 4 4.100988e+00 1.254237e+00 2.0726130 6.583306 3 6 1.091686e+01 5.155349e+00 4.1355369 19.945857 4 8 4.620601e+04 2.972336e+06 3.4087719 783.250598 5 10 1.294870e+08 1.289270e+10 2.7488995 21187.668085
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT), cLONG = (-2:2) * sd(cLONG))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = (coefs %*% t(Xmat))^2 s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.1349178 0.08794489 -0.04041036 0.3059156 2 4 0.2585383 0.05370009 0.15709832 0.3672346 3 6 0.3705840 0.04565108 0.28155083 0.4597610 4 8 0.4710550 0.07610010 0.32434150 0.6226791 5 10 0.5599513 0.12596962 0.32306584 0.8133294
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"]) (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.3003534 0.1948831 -0.07593415 0.6900725 2 4 0.5755871 0.1213214 0.33833739 0.8140084 3 6 0.8250532 0.1070657 0.60836234 1.0276555 4 8 1.0487518 0.1743212 0.70416978 1.3896697 5 10 1.2466827 0.2838059 0.70502259 1.8109759
# Percentage change ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.053393e+02 9.348612e+01 -28.95269 289.1523 2 4 3.119978e+02 1.294950e+02 103.90816 566.0258 3 6 9.959446e+02 5.039227e+02 323.90140 1919.1223 4 8 8.040494e+08 9.437749e+10 228.89724 82953.0847 5 10 3.366204e+07 9.964292e+08 243.68269 2220207.1191
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.7144681 0.9998582 1.0000000 1.0000000 1.0000000
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 2.053393e+00 9.348612e-01 0.7104731 3.891523 2 4 4.119978e+00 1.294950e+00 2.0390816 6.660258 3 6 1.095945e+01 5.039227e+00 4.2390140 20.191223 4 8 8.040495e+06 9.437749e+08 3.2889724 830.530847 5 10 3.366214e+05 9.964292e+06 3.4368269 22203.071191
paruelo.mcmc = as.matrix(paruelo.rstan) newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT), cLONG = (-2:2) * sd(cLONG))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = (coefs %*% t(Xmat))^2 s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.1319641 0.08789611 -0.04389211 0.3016103 2 4 0.2563267 0.05386641 0.15163208 0.3590931 3 6 0.3692837 0.04628573 0.28427916 0.4645861 4 8 0.4708352 0.07698113 0.31926392 0.6211574 5 10 0.5609811 0.12716248 0.31719799 0.8079195
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"]) (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.2935319 0.1942803 -0.09896972 0.6595833 2 4 0.5702913 0.1212148 0.34901086 0.8156394 3 6 0.8216624 0.1080325 0.61471403 1.0312566 4 8 1.0476449 0.1758745 0.69492968 1.3808014 5 10 1.2482390 0.2859291 0.67872851 1.7890767
# Percentage change ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.042273e+02 1.170636e+02 -38.27722 282.6902 2 4 3.092962e+02 1.348801e+02 109.35838 565.3589 3 6 9.873882e+02 4.955407e+02 323.40689 1873.6455 4 8 1.600200e+06 1.080694e+08 320.12706 78699.1297 5 10 7.245863e+08 4.756900e+10 24.83881 1632540.3890
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.6991111 1.0000000 1.0000000 1.0000000 0.9998519
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 2.042273e+00 1.170636e+00 0.6172278 3.826902 2 4 4.092962e+00 1.348801e+00 2.0935838 6.653589 3 6 1.087388e+01 4.955407e+00 4.2340689 19.736455 4 8 1.600300e+04 1.080694e+06 4.2012706 787.991297 5 10 7.245864e+06 4.756900e+08 1.2483881 16326.403890
paruelo.mcmc = as.matrix(paruelo.rstanarm) newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT), cLONG = (-2:2) * sd(cLONG))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = (coefs %*% t(Xmat))^2 s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.1341501 0.08848599 -0.03243586 0.3107923 2 4 0.2581415 0.05442864 0.15686023 0.3675871 3 6 0.3707618 0.04583440 0.28341930 0.4623395 4 8 0.4720110 0.07541079 0.32685749 0.6220711 5 10 0.5618890 0.12473091 0.32547913 0.8094347
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"]) (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.2986746 0.1958789 -0.07998586 0.6760735 2 4 0.5747869 0.1226097 0.33480854 0.8135894 3 6 0.8255451 0.1070300 0.61139499 1.0319382 4 8 1.0509493 0.1721461 0.69999432 1.3776946 5 10 1.2509993 0.2800144 0.71471192 1.8024770
# Percentage change ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.056659e+02 1.257457e+02 -30.97074 287.1470 2 4 3.105262e+02 1.329256e+02 108.88708 558.6118 3 6 9.851546e+02 4.843524e+02 318.35471 1857.7700 4 8 2.244783e+07 1.687360e+09 254.92501 80921.6730 5 10 3.250875e+07 1.501876e+09 209.17051 1786965.3351
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.7134815 0.9998519 1.0000000 1.0000000 1.0000000
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 2.056659e+00 1.257457e+00 0.6902926 3.871470 2 4 4.105262e+00 1.329256e+00 2.0888708 6.586118 3 6 1.085155e+01 4.843524e+00 4.1835471 19.577700 4 8 2.244793e+05 1.687360e+07 3.5492501 810.216730 5 10 3.250885e+05 1.501876e+07 3.0917051 17870.653351
paruelo.mcmc = as.matrix(paruelo.brm) newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT), cLONG = (-2:2) * sd(cLONG))) Xmat = model.matrix(~cLAT * cLONG, newdata) coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")] fit = (coefs %*% t(Xmat))^2 s1 = seq(1, 9, b = 2) s2 = seq(2, 10, b = 2) ## Raw effect size (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.1352462 0.08867666 -0.04429386 0.3055298 2 4 0.2579365 0.05387510 0.15274263 0.3620875 3 6 0.3693866 0.04588539 0.27546608 0.4539318 4 8 0.4695965 0.07691804 0.32720996 0.6250321 5 10 0.5585662 0.12721779 0.31538627 0.8060951
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"]) (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 0.3010112 0.1962995 -0.0918150 0.6830484 2 4 0.5740773 0.1213968 0.3380522 0.8117176 3 6 0.8221476 0.1071816 0.6197105 1.0355653 4 8 1.0452221 0.1758657 0.7327157 1.4183406 5 10 1.2433007 0.2864037 0.7091841 1.8161433
# Percentage change ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 1.065653e+02 9.500259e+01 -29.80102 292.8927 2 4 3.122440e+02 1.301541e+02 107.32351 566.7632 3 6 9.873246e+02 5.411930e+02 304.56631 1864.4427 4 8 9.556796e+06 6.550140e+08 236.86783 79799.0072 5 10 1.402617e+07 3.718325e+08 167.62419 1638648.6131
# Probability that the effect is greater than 50% (an increase of >50%) (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
2 4 6 8 10 0.7179259 0.9997037 1.0000000 1.0000000 1.0000000
## fractional change (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 2 2.065653e+00 9.500259e-01 0.7019898 3.928927 2 4 4.122440e+00 1.301541e+00 2.0732351 6.667632 3 6 1.087325e+01 5.411930e+00 4.0456631 19.644427 4 8 9.556896e+04 6.550140e+06 3.3686783 798.990072 5 10 1.402627e+05 3.718325e+06 2.6762419 16387.486131
- Explore finite-population standard deviations
library(MCMCpack) library(broom) paruelo.mcmc = paruelo.mcmcpack Xmat = model.matrix(~cLAT * cLONG, data = paruelo) sd.LAT = abs(paruelo.mcmc[, "cLAT"]) * sd(Xmat[, "cLAT"]) sd.LONG = abs(paruelo.mcmc[, "cLONG"]) * sd(Xmat[, "cLONG"]) sd.LATLONG = abs(paruelo.mcmc[, "cLAT:cLONG"]) * sd(Xmat[, "cLAT:cLONG"]) sd.x = sd.LAT + sd.LONG + sd.LATLONG # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 0.23169015 0.026451592 1.792035e-01 0.28260491 2 sd.LONG 0.02439035 0.017868035 4.028548e-06 0.05853145 3 sd.LATLONG 0.07925537 0.026142083 2.817931e-02 0.13091661 4 sd.resid 0.21542613 0.006389563 2.071466e-01 0.22865064
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 42.182310 2.898362 36.118457692 47.42229 2 sd.LONG 3.846411 3.084873 0.000724435 10.19369 3 sd.LATLONG 14.411731 3.858801 6.333041215 21.35791 4 sd.resid 39.018325 3.094226 33.791482239 45.61601
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix Xmat = model.matrix(~cLAT * cLONG, data = paruelo) sd.LAT = abs(paruelo.mcmc[, "beta[1]"]) * sd(Xmat[, "cLAT"]) sd.LONG = abs(paruelo.mcmc[, "beta[2]"]) * sd(Xmat[, "cLONG"]) sd.LATLONG = abs(paruelo.mcmc[, "beta[3]"]) * sd(Xmat[, "cLAT:cLONG"]) sd.x = sd.LAT + sd.LONG + sd.LATLONG # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 0.23203188 0.02627327 1.780725e-01 0.28120950 2 sd.LONG 0.02490470 0.01809705 3.322881e-08 0.05925782 3 sd.LATLONG 0.07922763 0.02651085 2.660959e-02 0.13048759 4 sd.resid 0.21553417 0.00653920 2.067978e-01 0.22860010
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 42.126265 2.898268 3.633944e+01 47.62775 2 sd.LONG 3.960719 3.115583 6.436877e-06 10.32277 3 sd.LATLONG 14.415918 3.921531 6.384791e+00 21.62824 4 sd.resid 38.967055 3.080186 3.369378e+01 45.44767
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
paruelo.mcmc = as.matrix(paruelo.rstan) Xmat = model.matrix(~cLAT * cLONG, data = paruelo) sd.LAT = abs(paruelo.mcmc[, "beta[1]"]) * sd(Xmat[, "cLAT"]) sd.LONG = abs(paruelo.mcmc[, "beta[2]"]) * sd(Xmat[, "cLONG"]) sd.LATLONG = abs(paruelo.mcmc[, "beta[3]"]) * sd(Xmat[, "cLAT:cLONG"]) sd.x = sd.LAT + sd.LONG + sd.LATLONG # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 0.23138295 0.02636773 1.760709e-01 0.2805903 2 sd.LONG 0.02485541 0.01786845 2.194655e-05 0.0588445 3 sd.LATLONG 0.07965332 0.02657340 3.020573e-02 0.1360499 4 sd.resid 0.21560997 0.00654553 2.071048e-01 0.2286751
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 42.049248 2.938491 36.18841075 47.75280 2 sd.LONG 3.921904 3.081730 0.00064015 10.26762 3 sd.LATLONG 14.478042 3.927237 6.46152709 21.81337 4 sd.resid 38.961952 3.125724 33.92858577 46.03353
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
paruelo.mcmc = as.matrix(paruelo.rstanarm) Xmat = model.matrix(~cLAT * cLONG, data = paruelo) sd.LAT = abs(paruelo.mcmc[, "cLAT"]) * sd(Xmat[, "cLAT"]) sd.LONG = abs(paruelo.mcmc[, "cLONG"]) * sd(Xmat[, "cLONG"]) sd.LATLONG = abs(paruelo.mcmc[, "cLAT:cLONG"]) * sd(Xmat[, "cLAT:cLONG"]) sd.x = sd.LAT + sd.LONG + sd.LATLONG # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 0.23185182 0.026294626 1.800538e-01 0.28352393 2 sd.LONG 0.02468265 0.017832763 1.694408e-06 0.05882593 3 sd.LATLONG 0.07937612 0.026349295 2.523573e-02 0.12834373 4 sd.resid 0.21552224 0.006501481 2.069207e-01 0.22867590
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 42.131235 2.873188 3.623640e+01 47.60497 2 sd.LONG 3.918416 3.073268 3.293841e-04 10.13401 3 sd.LATLONG 14.463484 3.913939 6.280235e+00 21.41574 4 sd.resid 38.973321 3.089661 3.408651e+01 45.92105
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
paruelo.mcmc = as.matrix(paruelo.brm) Xmat = model.matrix(~cLAT * cLONG, data = paruelo) sd.LAT = abs(paruelo.mcmc[, "b_cLAT"]) * sd(Xmat[, "cLAT"]) sd.LONG = abs(paruelo.mcmc[, "b_cLONG"]) * sd(Xmat[, "cLONG"]) sd.LATLONG = abs(paruelo.mcmc[, "b_cLAT:cLONG"]) * sd(Xmat[, "cLAT:cLONG"]) sd.x = sd.LAT + sd.LONG + sd.LATLONG # generate a model matrix newdata = paruelo Xmat = model.matrix(~cLAT * cLONG, newdata) ## get median parameter estimates coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")] fit = coefs %*% t(Xmat) resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 0.23129941 0.026340445 1.793043e-01 0.28218470 2 sd.LONG 0.02462705 0.017936072 2.508594e-05 0.05878528 3 sd.LATLONG 0.07870679 0.026680297 2.537974e-02 0.13204085 4 sd.resid 0.21548437 0.006463628 2.071177e-01 0.22864105
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.LAT 42.161471 2.925826 3.627886e+01 47.58413 2 sd.LONG 3.914417 3.097391 1.754179e-04 10.13780 3 sd.LATLONG 14.349954 3.967145 5.640723e+00 21.34444 4 sd.resid 39.046627 3.137516 3.387698e+01 45.82190
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
- Explore $R^2$
library(MCMCpack) library(broom) paruelo.mcmc <- paruelo.mcmcpack Xmat = model.matrix(~cLAT * cLONG, data = paruelo) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, sqrt(paruelo$C3), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5324033 0.05603256 0.4218392 0.6325557
# for comparison with frequentist summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
Call: lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo) Residuals: Min 1Q Median 3Q Max -0.51312 -0.13427 -0.01134 0.14086 0.38940 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4282658 0.0234347 18.275 < 2e-16 *** cLAT 0.0436937 0.0048670 8.977 3.28e-13 *** cLONG -0.0028773 0.0036842 -0.781 0.4375 cLAT:cLONG 0.0022824 0.0007471 3.055 0.0032 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1991 on 69 degrees of freedom Multiple R-squared: 0.5403, Adjusted R-squared: 0.5203 F-statistic: 27.03 on 3 and 69 DF, p-value: 1.128e-11
paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix Xmat = model.matrix(~cLAT * cLONG, data = paruelo) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, sqrt(paruelo$C3), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5331754 0.05548993 0.4212902 0.6298881
# for comparison with frequentist summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
Call: lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo) Residuals: Min 1Q Median 3Q Max -0.51312 -0.13427 -0.01134 0.14086 0.38940 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4282658 0.0234347 18.275 < 2e-16 *** cLAT 0.0436937 0.0048670 8.977 3.28e-13 *** cLONG -0.0028773 0.0036842 -0.781 0.4375 cLAT:cLONG 0.0022824 0.0007471 3.055 0.0032 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1991 on 69 degrees of freedom Multiple R-squared: 0.5403, Adjusted R-squared: 0.5203 F-statistic: 27.03 on 3 and 69 DF, p-value: 1.128e-11
paruelo.mcmc = as.matrix(paruelo.rstan) Xmat = model.matrix(~cLAT * cLONG, data = paruelo) coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, sqrt(paruelo$C3), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5316571 0.0559445 0.4233242 0.6339557
# for comparison with frequentist summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
Call: lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo) Residuals: Min 1Q Median 3Q Max -0.51312 -0.13427 -0.01134 0.14086 0.38940 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4282658 0.0234347 18.275 < 2e-16 *** cLAT 0.0436937 0.0048670 8.977 3.28e-13 *** cLONG -0.0028773 0.0036842 -0.781 0.4375 cLAT:cLONG 0.0022824 0.0007471 3.055 0.0032 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1991 on 69 degrees of freedom Multiple R-squared: 0.5403, Adjusted R-squared: 0.5203 F-statistic: 27.03 on 3 and 69 DF, p-value: 1.128e-11
paruelo.mcmc = as.matrix(paruelo.rstanarm) Xmat = model.matrix(~cLAT * cLONG, data = paruelo) coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, sqrt(paruelo$C3), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5327 0.05578058 0.4222702 0.6314585
# for comparison with frequentist summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
Call: lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo) Residuals: Min 1Q Median 3Q Max -0.51312 -0.13427 -0.01134 0.14086 0.38940 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4282658 0.0234347 18.275 < 2e-16 *** cLAT 0.0436937 0.0048670 8.977 3.28e-13 *** cLONG -0.0028773 0.0036842 -0.781 0.4375 cLAT:cLONG 0.0022824 0.0007471 3.055 0.0032 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1991 on 69 degrees of freedom Multiple R-squared: 0.5403, Adjusted R-squared: 0.5203 F-statistic: 27.03 on 3 and 69 DF, p-value: 1.128e-11
paruelo.mcmc = as.matrix(paruelo.brm) Xmat = model.matrix(~cLAT * cLONG, data = paruelo) coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, sqrt(paruelo$C3), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.5316392 0.05603751 0.4212598 0.6302732
# for comparison with frequentist summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
Call: lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo) Residuals: Min 1Q Median 3Q Max -0.51312 -0.13427 -0.01134 0.14086 0.38940 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4282658 0.0234347 18.275 < 2e-16 *** cLAT 0.0436937 0.0048670 8.977 3.28e-13 *** cLONG -0.0028773 0.0036842 -0.781 0.4375 cLAT:cLONG 0.0022824 0.0007471 3.055 0.0032 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1991 on 69 degrees of freedom Multiple R-squared: 0.5403, Adjusted R-squared: 0.5203 F-statistic: 27.03 on 3 and 69 DF, p-value: 1.128e-11
- Although not overly useful in the case of only two main effects and an interaction,
explore sparsity.
modelString=" data { int < lower =0 > n; # number of observations int < lower =0 > nX; # number of predictors vector [ n] Y; # outputs matrix [n ,nX] X; # inputs real < lower =0 > scale_icept ; # prior std for the intercept real < lower =0 > scale_global ; # scale for the half -t prior for tau real < lower =1 > nu_global ; # degrees of freedom for the half -t priors for tau real < lower =1 > nu_local ; # degrees of freedom for the half - t priors for lambdas real < lower =0 > slab_scale ; # slab scale for the regularized horseshoe real < lower =0 > slab_df ; # slab degrees of freedom for the regularized horseshoe } transformed data { matrix[n, nX - 1] Xc; // centered version of X vector[nX - 1] means_X; // column means of X before centering for (i in 2:nX) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { real logsigma ; real cbeta0 ; vector [ nX-1] z; real < lower =0 > tau ; # global shrinkage parameter vector < lower =0 >[ nX-1] lambda ; # local shrinkage parameter real < lower =0 > caux ; } transformed parameters { real < lower =0 > sigma ; # noise std vector < lower =0 >[ nX-1] lambda_tilde ; # ’ truncated ’ local shrinkage parameter real < lower =0 > c; # slab scale vector [ nX-1] beta ; # regression coefficients vector [ n] mu; # latent function values sigma = exp ( logsigma ); c = slab_scale * sqrt ( caux ); lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) ); beta = z .* lambda_tilde * tau ; mu = cbeta0 + Xc* beta ; } model { # half -t priors for lambdas and tau , and inverse - gamma for c ^2 z ~ normal (0 , 1); lambda ~ student_t ( nu_local , 0, 1); tau ~ student_t ( nu_global , 0 , scale_global * sigma ); caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df ); cbeta0 ~ normal (0 , scale_icept ); Y ~ normal (mu , sigma ); } generated quantities { real beta0; // population-level intercept vector[n] log_lik; beta0 = cbeta0 - dot_product(means_X, beta); for (i in 1:n) { log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma); } }"
X = model.matrix(~cLAT * cLONG, data = paruelo) paruelo.list <- with(paruelo, list(Y = sqrt(C3), X = X, nX = ncol(X), n = nrow(paruelo), scale_icept = 100, scale_global = 1, nu_global = 1, nu_local = 1, slab_scale = 2, slab_df = 4)) paruelo.rstan.sparsity <- stan(data = paruelo.list, model_code = modelString, chains = 3, iter = 4000, warmup = 2000, thin = 3, save_dso = TRUE)
SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 1). Gradient evaluation took 6.3e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1200 / 4000 [ 30%] (Warmup) Iteration: 1600 / 4000 [ 40%] (Warmup) Iteration: 2000 / 4000 [ 50%] (Warmup) Iteration: 2001 / 4000 [ 50%] (Sampling) Iteration: 2400 / 4000 [ 60%] (Sampling) Iteration: 2800 / 4000 [ 70%] (Sampling) Iteration: 3200 / 4000 [ 80%] (Sampling) Iteration: 3600 / 4000 [ 90%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 1.72439 seconds (Warm-up) 1.36468 seconds (Sampling) 3.08907 seconds (Total) SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 2). Gradient evaluation took 2.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1200 / 4000 [ 30%] (Warmup) Iteration: 1600 / 4000 [ 40%] (Warmup) Iteration: 2000 / 4000 [ 50%] (Warmup) Iteration: 2001 / 4000 [ 50%] (Sampling) Iteration: 2400 / 4000 [ 60%] (Sampling) Iteration: 2800 / 4000 [ 70%] (Sampling) Iteration: 3200 / 4000 [ 80%] (Sampling) Iteration: 3600 / 4000 [ 90%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 2.13692 seconds (Warm-up) 1.76853 seconds (Sampling) 3.90545 seconds (Total) SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 3). Gradient evaluation took 2.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. Adjust your expectations accordingly! Iteration: 1 / 4000 [ 0%] (Warmup) Iteration: 400 / 4000 [ 10%] (Warmup) Iteration: 800 / 4000 [ 20%] (Warmup) Iteration: 1200 / 4000 [ 30%] (Warmup) Iteration: 1600 / 4000 [ 40%] (Warmup) Iteration: 2000 / 4000 [ 50%] (Warmup) Iteration: 2001 / 4000 [ 50%] (Sampling) Iteration: 2400 / 4000 [ 60%] (Sampling) Iteration: 2800 / 4000 [ 70%] (Sampling) Iteration: 3200 / 4000 [ 80%] (Sampling) Iteration: 3600 / 4000 [ 90%] (Sampling) Iteration: 4000 / 4000 [100%] (Sampling) Elapsed Time: 1.83401 seconds (Warm-up) 2.12364 seconds (Sampling) 3.95765 seconds (Total)
tidyMCMC(paruelo.rstan.sparsity, pars = c("beta[1]", "beta[2]", "beta[3]"), conf.int = TRUE, conf.type = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 beta[1] 0.042527103 0.0050097190 0.0327966631 0.052135937 1.000417 2001 2 beta[2] -0.001716738 0.0031593104 -0.0087583583 0.004329824 1.007750 302 3 beta[3] 0.002134985 0.0007772265 0.0005952987 0.003651370 1.005599 1344
library(bayesplot) mcmc_areas(as.matrix(paruelo.rstan.sparsity), pars = c("beta[1]", "beta[2]", "beta[3]"))
n = nrow(paruelo) +\n X = 2 p0 = 1 global_scale = p0/(nX - p0)/sqrt(n) paruelo.rstanarm.sparsity = stan_glm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = hs(df = 1, global_df = 1, global_scale = global_scale), prior_aux = cauchy(0, 2))
Gradient evaluation took 6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds. Adjust your expectations accordingly! Elapsed Time: 1.0696 seconds (Warm-up) 3.66112 seconds (Sampling) 4.73072 seconds (Total) Gradient evaluation took 2.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds. Adjust your expectations accordingly! Elapsed Time: 0.751543 seconds (Warm-up) 4.86999 seconds (Sampling) 5.62153 seconds (Total) Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Elapsed Time: 0.915796 seconds (Warm-up) 4.15483 seconds (Sampling) 5.07063 seconds (Total)
print(paruelo.rstanarm.sparsity)
stan_glm family: gaussian [identity] formula: sqrt(C3) ~ cLAT * cLONG ------ Estimates: Median MAD_SD (Intercept) 0.4 0.0 cLAT 0.0 0.0 cLONG 0.0 0.0 cLAT:cLONG 0.0 0.0 sigma 0.2 0.0 Sample avg. posterior predictive distribution of y (X = xbar): Median MAD_SD mean_PPD 0.4 0.0 ------ For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(paruelo.rstanarm.sparsity$stanfit, conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 (Intercept) 0.427902575 0.0236414863 0.384058534 0.476529216 1.000818 2231 2 cLAT 0.042106547 0.0051212043 0.031640909 0.052108292 1.000424 2700 3 cLONG -0.001616742 0.0030434350 -0.008075258 0.004223560 1.001325 2422 4 cLAT:cLONG 0.002078546 0.0007825663 0.000561742 0.003633725 1.000547 2164 5 sigma 0.201559149 0.0171438985 0.169816432 0.235290681 1.000151 2307 6 mean_PPD 0.434643137 0.0335247480 0.370934109 0.501939647 1.000614 2414 7 log-posterior -13.229908688 2.9708421088 -18.809548038 -7.664415828 1.001692 1304
library(bayesplot) mcmc_areas(as.matrix(paruelo.rstanarm.sparsity), regex_par = "cL")
n = nrow(paruelo) +\n X = 2 p0 = 1 global_scale = p0/(nX - p0)/sqrt(n) paruelo.brms.sparsity = brm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), prior(horseshoe(df = 1, par_ratio = par_ratio), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 4.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds. Adjust your expectations accordingly! Elapsed Time: 0.241516 seconds (Warm-up) 0.957428 seconds (Sampling) 1.19894 seconds (Total) Gradient evaluation took 1.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds. Adjust your expectations accordingly! Elapsed Time: 0.226307 seconds (Warm-up) 1.43232 seconds (Sampling) 1.65862 seconds (Total) Gradient evaluation took 1.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. Adjust your expectations accordingly! Elapsed Time: 0.250383 seconds (Warm-up) 1.40852 seconds (Sampling) 1.6589 seconds (Total)
print(paruelo.brms.sparsity)
Family: gaussian(identity) Formula: sqrt(C3) ~ cLAT * cLONG Data: paruelo (Number of observations: 73) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 0.43 0.02 0.38 0.48 2509 1 cLAT 0.04 0.01 0.03 0.05 1931 1 cLONG 0.00 0.00 -0.01 0.00 2543 1 cLAT:cLONG 0.00 0.00 0.00 0.00 2376 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 0.2 0.02 0.17 0.24 2269 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(paruelo.brms.sparsity$fit, conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 b_Intercept 0.429311590 0.0240182589 0.382581777 0.476884717 0.9997970 2509 2 b_cLAT 0.042280287 0.0050651053 0.032019965 0.051715046 0.9992565 1931 3 b_cLONG -0.001707310 0.0029955918 -0.007777912 0.004060562 1.0005126 2543 4 b_cLAT:cLONG 0.002088576 0.0008262995 0.000429252 0.003678067 0.9998758 2376 5 sigma 0.202876614 0.0179247843 0.170190244 0.239294598 0.9998154 2269 6 hs_c2 1.907347192 2.9752114577 0.238132959 5.487135251 0.9994273 1930
library(bayesplot) mcmc_areas(as.matrix(paruelo.brms.sparsity), regex_par = "cL")
Whilst there are no real issues with the residuals, The violin plots and predicted trends should raise alarm bells. On the square-root scale, predicted values associated with low cLAT are less than 0. When a mixture of estimates above and below zero are back-transformed onto the natural scale, negative values will become positive. Hence the order of the data will not be preserved during the back-transform. This is not good.
Although root transformations to the inverse of an odd power will preserve the polarity of estimates, it is of course not logical to have a C3 abundance less than 0. We will ignore this issue for now, yet we will again note that the Gaussian approach is probably inappropriate.
There is some support for an interaction.
Multiple Linear Regression
Loyn (1987) modeled the abundance of forest birds with six predictor variables (patch area, distance to nearest patch, distance to nearest larger patch, grazing intensity, altitude and years since the patch had been isolated).
Download Loyn data setFormat of loyn.csv data file | |||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
loyn <- read.table("../downloads/data/loyn.csv", header = T, sep = ",", strip.white = T) head(loyn)
ABUND AREA YR.ISOL DIST LDIST GRAZE ALT 1 5.3 0.1 1968 39 39 2 160 2 2.0 0.5 1920 234 234 5 60 3 1.5 0.5 1900 104 311 5 140 4 17.1 1.0 1966 66 66 3 160 5 13.8 1.0 1918 246 246 5 140 6 14.1 1.0 1965 234 285 3 130
-
Perform exploratory data analysis to help guide what sort of analysis will be suitable
and whether the various assumptions are likely to be met.
# via car's scatterplotMatrix function library(car) scatterplotMatrix(~ABUND + DIST + LDIST + AREA + GRAZE + ALT + YR.ISOL, data = loyn, diagonal = "boxplot")
# via lattice library(lattice) splom.lat <- splom(loyn, type = c("p", "r")) print(splom.lat)
# via ggplot2 - warning these are slow! library(GGally) ggpairs(loyn, lower = list(continuous = "smooth"), diag = list(continuous = "density"), axisLabels = "none")
- Whilst in many model fitting and graphing
routines are able to perform transformation inline, for more complex examples, it is often advisable to also create transformed
versions of variables.
# via car's scatterplotMatrix function library(car) scatterplotMatrix(~ABUND + log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn, diagonal = "boxplot")
ggpairs(with(loyn, data.frame(logDIST = log10(DIST), logLDIST = log(LDIST), logAREA = log10(AREA), GRAZE, ALT, YR.ISOL)), lower = list(continuous = "smooth"), diag = list(continuous = "density"), axisLabels = "none")
- Explore (multi)collinearity.
loyn.lm <- lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn) vif(loyn.lm)
log10(DIST) log10(LDIST) log10(AREA) GRAZE ALT YR.ISOL 1.654553 2.009749 1.911514 2.524814 1.467937 1.804769
1/vif(loyn.lm)
log10(DIST) log10(LDIST) log10(AREA) GRAZE ALT YR.ISOL 0.6043930 0.4975746 0.5231454 0.3960688 0.6812282 0.5540876
-
In preparation for a Bayesian regression models, we should center each of the predictor variables.
Note, transformations must occur prior to centering...
mean.DIST = mean(log10(loyn$DIST)) mean.LDIST = mean(log10(loyn$LDIST)) mean.AREA = mean(log10(loyn$AREA)) mean.GRAZE = mean(loyn$GRAZE) mean.ALT = mean(loyn$ALT) mean.YR.ISOL = mean(loyn$YR.ISOL) loyn = loyn %>% dplyr:::mutate(cDIST = log10(DIST), cDIST = cDIST - mean(cDIST), cLDIST = log10(LDIST), cLDIST = cLDIST - mean(cLDIST), cAREA = log10(AREA), cAREA = cAREA - mean(cAREA), cGRAZE = GRAZE - mean(GRAZE), cALT = ALT - mean(ALT), cYR.ISOL = YR.ISOL - mean(YR.ISOL))
-
Fit the appropriate Bayesian model to explore the effect of the various predictors on the Abundance of forest birds.
library(MCMCpack) loyn.mcmcpack = MCMCregress(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn)
modelString = " model { #Likelihood for (i in 1:n) { y[i]~dnorm(mu[i],tau) mu[i] <- beta0 + inprod(beta[],X[i,]) } #Priors beta0 ~ dnorm(0.01,1.0E-6) for (j in 1:nX) { beta[j] ~ dnorm(0.01,1.0E-6) } tau <- 1 / (sigma * sigma) sigma~dunif(0,100) } " X = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) loyn.list <- with(loyn, list(y = ABUND, X = X[, -1], nX = ncol(X) - 1, n = nrow(loyn))) params <- c("beta0", "beta", "sigma") burnInSteps = 3000 nChains = 3 numSavedSteps = 15000 thinSteps = 10 nIter = ceiling((numSavedSteps * thinSteps)/nChains) loyn.r2jags <- jags(data = loyn.list, inits = NULL, parameters.to.save = params, model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter, n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 56 Unobserved stochastic nodes: 8 Total graph size: 590 Initializing model
modelString=" data { int
n; // total number of observations vector[n] Y; // response variable int nX; // number of effects matrix[n, nX] X; // model matrix } transformed data { matrix[n, nX - 1] Xc; // centered version of X vector[nX - 1] means_X; // column means of X before centering for (i in 2:nX) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { vector[nX-1] beta; // population-level effects real cbeta0; // center-scale intercept real sigma; // residual SD } transformed parameters { } model { vector[n] mu; mu = Xc * beta + cbeta0; // prior specifications beta ~ normal(0, 10); cbeta0 ~ normal(0, 10); sigma ~ cauchy(0, 5); // likelihood contribution Y ~ normal(mu, sigma); } generated quantities { real beta0; // population-level intercept vector[n] log_lik; beta0 = cbeta0 - dot_product(means_X, beta); for (i in 1:n) { log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma); } } " X = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) loyn.list <- with(loyn, list(Y = ABUND, X = X, nX = ncol(X), n = nrow(loyn))) library(rstan) loyn.rstan <- stan(data = loyn.list, model_code = modelString, chains = 3, iter = 5000, warmup = 500, thin = 2)
SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1). Gradient evaluation took 3.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.138507 seconds (Warm-up) 0.270008 seconds (Sampling) 0.408515 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2). Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.122783 seconds (Warm-up) 0.247383 seconds (Sampling) 0.370166 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3). Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.141817 seconds (Warm-up) 0.28119 seconds (Sampling) 0.423007 seconds (Total)
loyn.rstanarm = stan_glm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn, iter = 5000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 10), prior = normal(0, 10), prior_aux = cauchy(0, 5))
Gradient evaluation took 3.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.38 seconds. Adjust your expectations accordingly! Elapsed Time: 0.099257 seconds (Warm-up) 0.455713 seconds (Sampling) 0.55497 seconds (Total) Gradient evaluation took 1.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. Adjust your expectations accordingly! Elapsed Time: 0.105746 seconds (Warm-up) 0.410567 seconds (Sampling) 0.516313 seconds (Total) Gradient evaluation took 1.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds. Adjust your expectations accordingly! Elapsed Time: 0.096972 seconds (Warm-up) 0.416244 seconds (Sampling) 0.513216 seconds (Total)
loyn.brm = brm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn, iter = 5000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 10), class = "Intercept"), prior(normal(0, 10), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Elapsed Time: 0.126585 seconds (Warm-up) 0.269009 seconds (Sampling) 0.395594 seconds (Total) Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Elapsed Time: 0.125164 seconds (Warm-up) 0.245459 seconds (Sampling) 0.370623 seconds (Total) Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Elapsed Time: 0.120374 seconds (Warm-up) 0.273265 seconds (Sampling) 0.393639 seconds (Total)
- Explore MCMC diagnostics
library(MCMCpack) plot(loyn.mcmcpack)
raftery.diag(loyn.mcmcpack)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) (Intercept) 2 3741 3746 0.999 cDIST 2 3802 3746 1.010 cLDIST 2 3771 3746 1.010 cAREA 2 3929 3746 1.050 cGRAZE 2 3771 3746 1.010 cALT 2 3802 3746 1.010 cYR.ISOL 2 3771 3746 1.010 sigma2 2 3929 3746 1.050
autocorr.diag(loyn.mcmcpack)
(Intercept) cDIST cLDIST cAREA cGRAZE cALT cYR.ISOL Lag 0 1.000000000 1.0000000000 1.000000000 1.000000000 1.000000000 1.0000000000 1.000000000 Lag 1 0.003859081 -0.0031673984 -0.019186754 -0.011659408 0.006409519 -0.0038454362 -0.006161199 Lag 5 0.001559142 0.0112382480 -0.002730253 -0.021259031 -0.016927056 -0.0144856450 -0.020155560 Lag 10 -0.005782004 -0.0008454841 0.002457735 0.015401555 0.006719719 -0.0064410769 0.002482767 Lag 50 0.014999125 0.0108067185 -0.005566923 -0.008437947 -0.007984261 -0.0005289131 0.002831566 sigma2 Lag 0 1.0000000000 Lag 1 0.1473528757 Lag 5 0.0002854281 Lag 10 -0.0046914506 Lag 50 -0.0058806412
library(R2jags) library(coda) loyn.mcmc = as.mcmc(loyn.r2jags) plot(loyn.mcmc)
raftery.diag(loyn.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 37020 3746 9.88 beta[1] 20 38330 3746 10.20 beta[2] 20 38330 3746 10.20 beta[3] 20 37020 3746 9.88 beta[4] 20 35750 3746 9.54 beta[5] 10 37660 3746 10.10 beta[6] 20 38330 3746 10.20 deviance 20 37020 3746 9.88 sigma 20 37020 3746 9.88 [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 36380 3746 9.71 beta[1] 10 37660 3746 10.10 beta[2] 20 38330 3746 10.20 beta[3] 20 39680 3746 10.60 beta[4] 20 39000 3746 10.40 beta[5] 20 39000 3746 10.40 beta[6] 20 36380 3746 9.71 deviance 20 38330 3746 10.20 sigma 20 39000 3746 10.40 [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 37020 3746 9.88 beta[1] 20 37020 3746 9.88 beta[2] 10 37660 3746 10.10 beta[3] 20 36380 3746 9.71 beta[4] 20 39000 3746 10.40 beta[5] 20 39730 3746 10.60 beta[6] 20 38330 3746 10.20 deviance 20 39000 3746 10.40 sigma 20 35750 3746 9.54
autocorr.diag(loyn.mcmc)
beta0 beta[1] beta[2] beta[3] beta[4] beta[5] Lag 0 1.0000000000 1.000000000 1.000000e+00 1.000000000 1.0000000000 1.0000000000 Lag 10 -0.0002701973 0.005792608 4.096609e-03 -0.002465089 0.0064268862 -0.0056341418 Lag 50 -0.0024666264 -0.003008229 5.990082e-03 -0.002866591 0.0170218192 -0.0052958457 Lag 100 -0.0055030580 0.007310639 6.781602e-05 -0.005953953 0.0147580101 0.0060628956 Lag 500 0.0012328414 -0.005326759 -6.548372e-03 0.000738738 0.0005113358 0.0001472076 beta[6] deviance sigma Lag 0 1.0000000000 1.000000000 1.000000000 Lag 10 -0.0011343808 0.006388500 0.012755663 Lag 50 0.0092845920 -0.014166264 0.009013642 Lag 100 -0.0065748440 0.001447733 -0.006410172 Lag 500 0.0005790938 -0.011828114 -0.021543097
library(rstan) library(coda) s = as.array(loyn.rstan) loyn.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("beta0", "beta[1]", "beta[2]", "beta[3]", "sigma")], 2, as.mcmc)) plot(loyn.mcmc)
raftery.diag(loyn.mcmc)
$`1` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`2` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`3` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(loyn.mcmc)
beta0 beta[1] beta[2] beta[3] sigma Lag 0 1.000000000 1.000000000 1.000000000 1.0000000000 1.0000000000 Lag 1 0.029133427 0.037125861 0.061232330 0.0361683227 0.0743461011 Lag 5 -0.011825015 0.002171057 -0.001297878 -0.0270706359 0.0264250864 Lag 10 -0.021725986 -0.003345725 0.022047199 -0.0101347535 0.0001568589 Lag 50 0.004095666 0.004849422 0.012866721 0.0006091856 -0.0001450329
library(rstan) library(coda) stan_ac(loyn.rstan, pars = c("beta", "sigma"))
stan_rhat(loyn.rstan, pars = c("beta", "sigma"))
stan_ess(loyn.rstan, pars = c("beta", "sigma"))
# using Bayeseplot library(bayesplot) detach("package:reshape") mcmc_trace(as.array(loyn.rstan), regex_par = "beta|sigma")
mcmc_trace(as.array(loyn.rstan), regex_pars = "beta|sigma")
mcmc_dens(as.array(loyn.rstan), regex_par = "beta|sigma")
detach("package:reshape") library(bayesplot) mcmc_combo(as.array(loyn.rstan), regex_par = "beta|sigma")
s = as.array(loyn.rstanarm) loyn.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "YR.ISOL", "sigma")], 2, as.mcmc))
Error in s[, , c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", : subscript out of bounds
plot(loyn.mcmc)
raftery.diag(loyn.mcmc)
$`1` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`2` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`3` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(loyn.mcmc)
beta0 beta[1] beta[2] beta[3] sigma Lag 0 1.000000000 1.000000000 1.000000000 1.0000000000 1.0000000000 Lag 1 0.029133427 0.037125861 0.061232330 0.0361683227 0.0743461011 Lag 5 -0.011825015 0.002171057 -0.001297878 -0.0270706359 0.0264250864 Lag 10 -0.021725986 -0.003345725 0.022047199 -0.0101347535 0.0001568589 Lag 50 0.004095666 0.004849422 0.012866721 0.0006091856 -0.0001450329
library(rstanarm) library(coda) stan_ac(loyn.rstanarm, pars = c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL", "sigma"))
stan_rhat(loyn.rstanarm, pars = c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL", "sigma"))
stan_ess(loyn.rstanarm, pars = c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL", "sigma"))
# using Bayeseplot library(bayesplot) detach("package:reshape") mcmc_trace(as.array(loyn.rstanarm), regex_par = "^c|sigma")
mcmc_trace(as.array(loyn.rstanarm), regex_pars = "^c|sigma")
mcmc_dens(as.array(loyn.rstanarm), regex_par = "^c|sigma")
detach("package:reshape") library(bayesplot) mcmc_combo(as.array(loyn.rstanarm), regex_par = "^c|sigma")
loyn.mcmc = as.mccm(loyn.brm)
Error in eval(expr, envir, enclos): could not find function "as.mccm"
plot(loyn.mcmc)
raftery.diag(loyn.mcmc)
$`1` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`2` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`3` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(loyn.mcmc)
beta0 beta[1] beta[2] beta[3] sigma Lag 0 1.000000000 1.000000000 1.000000000 1.0000000000 1.0000000000 Lag 1 0.029133427 0.037125861 0.061232330 0.0361683227 0.0743461011 Lag 5 -0.011825015 0.002171057 -0.001297878 -0.0270706359 0.0264250864 Lag 10 -0.021725986 -0.003345725 0.022047199 -0.0101347535 0.0001568589 Lag 50 0.004095666 0.004849422 0.012866721 0.0006091856 -0.0001450329
library(coda) stan_ac(loyn.brm$fit)
stan_rhat(loyn.brm$fit)
stan_ess(loyn.brm$fit)
# using Bayeseplot library(bayesplot) detach("package:reshape") mcmc_trace(as.array(loyn.brm), regex_par = "^b|sigma")
mcmc_trace(as.array(loyn.rstanarm), regex_pars = "^c|sigma")
mcmc_dens(as.array(loyn.rstanarm), regex_par = "^c|sigma")
detach("package:reshape") library(bayesplot) mcmc_combo(as.array(loyn.brm), regex_par = "^b|sigma")
- Perform model validation
library(MCMCpack) loyn.mcmc = as.data.frame(loyn.mcmcpack) # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
library(MCMCpack) loyn.mcmc = as.data.frame(loyn.mcmcpack) # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scale = "free_x")
library(MCMCpack) loyn.mcmc = as.data.frame(loyn.mcmcpack) # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
library(MCMCpack) loyn.mcmc = as.data.frame(loyn.mcmcpack) # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat)
Error in coefs %*% t(Xmat): requires numeric/complex matrix/vector arguments
## draw samples from this model yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ], sqrt(loyn.mcmc[i, "sigma2"])))
Error in fit[i, ]: incorrect number of dimensions
ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"), alpha = 0.5)
library(bayesplot) mcmc_intervals(as.matrix(loyn.mcmcpack), regex_pars = "^c")
mcmc_areas(as.matrix(loyn.mcmcpack), regex_pars = "^c")
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scale = "free_x")
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ], loyn.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"), alpha = 0.5)
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix coefs = loyn.mcmc[, 1:7] # generate prediction matrix Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100), # new_data(loyn, seq='cLDIST', len=100), new_data(loyn, # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE', # len=100), new_data(loyn, seq='cALT', len=100), # new_data(loyn, seq='cYR.ISOL', len=100)) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = newdata) fit = coefs %*% t(Xmat) # add noise for prediction instead of confidence fit = t(sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(newdata), fit[i, ], loyn.mcmc[i, "sigma"]))) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt, aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred, scale = "free_x")
library(bayesplot) mcmc_intervals(loyn.r2jags$BUGSoutput$sims.matrix, regex_pars = "^beta")
mcmc_areas(loyn.r2jags$BUGSoutput$sims.matrix, regex_pars = "^beta")
loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scale = "free_x")
loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = loyn$ABUND - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ], loyn.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"), alpha = 0.5)
loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix coefs = loyn.mcmc[, 1:7] # generate prediction matrix Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100), # new_data(loyn, seq='cLDIST', len=100), new_data(loyn, # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE', # len=100), new_data(loyn, seq='cALT', len=100), # new_data(loyn, seq='cYR.ISOL', len=100)) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = newdata) fit = coefs %*% t(Xmat) # add noise for prediction instead of confidence fit = t(sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(newdata), fit[i, ], loyn.mcmc[i, "sigma"]))) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt, aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred, scale = "free_x")
library(bayesplot) loyn.mcmc = as.matrix(loyn.rstan) mcmc_intervals(loyn.mcmc, regex_pars = "^beta")
mcmc_areas(loyn.mcmc, regex_pars = "^beta")
resid = resid(loyn.rstanarm) fit = fitted(loyn.rstanarm) ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
resid = resid(loyn.rstanarm) loyn.melt = loyn %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL) ggplot(loyn.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free_x")
resid = resid(loyn.rstanarm) sresid = resid/sd(resid) fit = fitted(loyn.rstanarm) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
y_pred = posterior_predict(loyn.rstanarm) newdata = loyn %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -ABUND:-cYR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL) loyn.melt = loyn %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL) ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) + geom_violin(data = loyn.melt, aes(y = ABUND, x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred, scales = "free_x")
loyn.mcmc = as.data.frame(loyn.rstanarm) %>% dplyr:::select(matches("Inter"), starts_with("c"), sigma) %>% as.matrix coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ], loyn.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"), alpha = 0.5)
# generate prediction matrix Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100), # new_data(loyn, seq='cLDIST', len=100), new_data(loyn, # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE', # len=100), new_data(loyn, seq='cALT', len=100), # new_data(loyn, seq='cYR.ISOL', len=100)) fit = posterior_predict(loyn.rstanarm, newdata = newdata) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt, aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred, scale = "free_x")
library(bayesplot) loyn.mcmc = as.matrix(loyn.rstanarm) mcmc_intervals(loyn.mcmc, regex_pars = "^c")
mcmc_areas(loyn.mcmc, regex_pars = "^c")
resid = resid(loyn.brm)[, "Estimate"] fit = fitted(loyn.brm)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
resid = resid(loyn.brm)[, "Estimate"] loyn.melt = loyn %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL) ggplot(loyn.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free_x")
resid = resid(loyn.brm)[, "Estimate"] sresid = resid/sd(resid) fit = fitted(loyn.brm)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
y_pred = posterior_predict(loyn.brm) newdata = loyn %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -ABUND:-cYR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL) loyn.melt = loyn %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL) ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) + geom_violin(data = loyn.melt, aes(y = ABUND, x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred, scales = "free_x")
loyn.mcmc = as.data.frame(loyn.brm) %>% dplyr:::select(starts_with("b_"), sigma) %>% as.matrix coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")]
Error in loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", : subscript out of bounds
# generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ], loyn.mcmc[i, "sigma"])) ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"), alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"), alpha = 0.5)
# generate prediction matrix Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100), # new_data(loyn, seq='cLDIST', len=100), new_data(loyn, # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE', # len=100), new_data(loyn, seq='cALT', len=100), # new_data(loyn, seq='cYR.ISOL', len=100)) fit = posterior_predict(loyn.brm, newdata = newdata) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt, aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + theme_classic() + facet_wrap(~Pred, scale = "free_x")
library(bayesplot) loyn.mcmc = as.matrix(loyn.brm) mcmc_intervals(loyn.mcmc, regex_pars = "^b_")
mcmc_areas(loyn.mcmc, regex_pars = "^b_")
- Explore parameter estimates
library(MCMCpack) summary(loyn.mcmcpack)
Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 19.52399 0.86649 0.0086649 0.0086649 cDIST -0.91832 2.74520 0.0274520 0.0255319 cLDIST -0.61941 2.17855 0.0217855 0.0213725 cAREA 7.45165 1.49424 0.0149424 0.0149424 cGRAZE -1.67804 0.95447 0.0095447 0.0095447 cALT 0.01962 0.02438 0.0002438 0.0002405 cYR.ISOL 0.07350 0.04690 0.0004690 0.0004690 sigma2 42.58137 8.92244 0.0892244 0.1035069 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 17.81405 18.959661 19.51913 20.09181 21.22761 cDIST -6.35916 -2.741141 -0.88385 0.89585 4.45512 cLDIST -4.88865 -2.063250 -0.59703 0.83050 3.63505 cAREA 4.52004 6.457833 7.43912 8.43842 10.40461 cGRAZE -3.54347 -2.308868 -1.68057 -1.04469 0.18256 cALT -0.02837 0.003564 0.01963 0.03575 0.06727 cYR.ISOL -0.01960 0.042925 0.07330 0.10428 0.16627 sigma2 28.49295 36.278224 41.47093 47.58073 63.37241
library(broom) tidyMCMC(loyn.mcmcpack, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 19.52399358 0.86649050 17.84303378 21.25142949 2 cDIST -0.91831714 2.74520058 -6.46277160 4.33066565 3 cLDIST -0.61941346 2.17855377 -4.86177798 3.64754513 4 cAREA 7.45164638 1.49423574 4.48112045 10.35905122 5 cGRAZE -1.67804492 0.95446814 -3.57512530 0.13921553 6 cALT 0.01962116 0.02437745 -0.02696416 0.06833682 7 cYR.ISOL 0.07349636 0.04690323 -0.02117776 0.16423954 8 sigma2 42.58136989 8.92244191 26.73620812 60.24660624
mcmcpvalue(loyn.mcmcpack[, "cDIST"])
[1] 0.7335
mcmcpvalue(loyn.mcmcpack[, "cLDIST"])
[1] 0.7705
mcmcpvalue(loyn.mcmcpack[, "cAREA"])
[1] 0
mcmcpvalue(loyn.mcmcpack[, "cGRAZE"])
[1] 0.0764
mcmcpvalue(loyn.mcmcpack[, "cALT"])
[1] 0.4132
mcmcpvalue(loyn.mcmcpack[, "cYR.ISOL"])
[1] 0.1175
print(loyn.r2jags)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 50000 iterations (first 3000 discarded), n.thin = 10 n.sims = 14100 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] -0.925 2.769 -6.342 -2.786 -0.903 0.920 4.474 1.001 14000 beta[2] -0.645 2.192 -4.999 -2.106 -0.655 0.817 3.653 1.001 14000 beta[3] 7.470 1.509 4.480 6.458 7.459 8.478 10.441 1.001 14000 beta[4] -1.666 0.967 -3.560 -2.317 -1.668 -1.026 0.239 1.001 14000 beta[5] 0.020 0.025 -0.029 0.003 0.020 0.036 0.068 1.001 14000 beta[6] 0.074 0.047 -0.020 0.043 0.074 0.106 0.166 1.001 14000 beta0 19.517 0.887 17.802 18.920 19.519 20.100 21.273 1.001 14000 sigma 6.553 0.680 5.382 6.076 6.498 6.973 8.038 1.001 14000 deviance 367.930 4.422 361.502 364.691 367.249 370.363 378.574 1.001 11000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 9.8 and DIC = 377.7 DIC is an estimate of expected predictive error (lower deviance is better).
library(broom) tidyMCMC(as.mcmc(loyn.r2jags), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 beta0 19.51671537 0.88653929 17.80044107 21.27151905 2 beta[1] -0.92516084 2.76914318 -6.31725094 4.47780930 3 beta[2] -0.64512795 2.19237959 -4.87230718 3.74680583 4 beta[3] 7.46983136 1.50921976 4.49154376 10.45213033 5 beta[4] -1.66558727 0.96744947 -3.51363528 0.28381275 6 beta[5] 0.01970088 0.02497929 -0.02931751 0.06843318 7 beta[6] 0.07398983 0.04714504 -0.01834409 0.16712090 8 deviance 367.92987558 4.42245372 360.47334022 376.40993885 9 sigma 6.55252461 0.68025713 5.29232571 7.90205315
mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[1]"])
[1] 0.7329787
mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[2]"])
[1] 0.7687234
mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[3]"])
[1] 0
mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[4]"])
[1] 0.08553191
mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[5]"])
[1] 0.4195035
mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[6]"])
[1] 0.1158156
print(loyn.rstan, pars = c("beta0", "beta", "sigma"))
Inference for Stan model: d98dbf6a02725fc3fce11306b77873e9. 3 chains, each with iter=5000; warmup=500; thin=2; post-warmup draws per chain=2250, total post-warmup draws=6750. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta0 19.36 0.01 0.87 17.63 18.79 19.36 19.95 21.05 6389 1 beta[1] -0.87 0.03 2.64 -5.97 -2.67 -0.84 0.89 4.26 6167 1 beta[2] -0.54 0.03 2.10 -4.64 -1.95 -0.56 0.84 3.62 6014 1 beta[3] 7.33 0.02 1.47 4.40 6.36 7.32 8.33 10.23 6293 1 beta[4] -1.67 0.01 0.94 -3.48 -2.32 -1.70 -1.04 0.19 6149 1 beta[5] 0.02 0.00 0.02 -0.03 0.00 0.02 0.04 0.07 6240 1 beta[6] 0.07 0.00 0.05 -0.02 0.04 0.07 0.11 0.17 5908 1 sigma 6.46 0.01 0.66 5.34 5.99 6.40 6.87 7.90 5582 1 Samples were drawn using NUTS(diag_e) at Mon Aug 28 12:58:03 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
library(broom) tidyMCMC(loyn.rstan, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta0", "beta", "sigma"))
term estimate std.error conf.low conf.high 1 beta0 19.35781914 0.87419049 17.63200046 21.04667110 2 beta[1] -0.87296466 2.64087863 -6.00306466 4.21472882 3 beta[2] -0.53619405 2.10350367 -4.59917757 3.65478091 4 beta[3] 7.33464753 1.46590498 4.38357023 10.20416316 5 beta[4] -1.67449356 0.94137372 -3.47370216 0.19084411 6 beta[5] 0.02026565 0.02432515 -0.02633551 0.07000986 7 beta[6] 0.07487073 0.04626295 -0.01566025 0.16666197 8 sigma 6.46001116 0.66131687 5.23776603 7.75854853
mcmcpvalue(as.matrix(loyn.rstan)[, "beta[1]"])
[1] 0.7416296
mcmcpvalue(as.matrix(loyn.rstan)[, "beta[2]"])
[1] 0.7917037
mcmcpvalue(as.matrix(loyn.rstan)[, "beta[3]"])
[1] 0
mcmcpvalue(as.matrix(loyn.rstan)[, "beta[4]"])
[1] 0.07155556
mcmcpvalue(as.matrix(loyn.rstan)[, "beta[5]"])
[1] 0.390963
mcmcpvalue(as.matrix(loyn.rstan)[, "beta[6]"])
[1] 0.1060741
# lets explore the support for GRAZE via loo library(loo) (full = loo(extract_log_lik(loyn.rstan)))
Computed from 6750 by 56 log-likelihood matrix Estimate SE elpd_loo -188.6 6.2 p_loo 8.1 1.8 looic 377.1 12.5 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 54 96.4% (0.5, 0.7] (ok) 2 3.6% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
X = model.matrix(~cDIST + cLDIST + cAREA + cALT + cYR.ISOL, data = loyn) loyn.list <- with(loyn, list(Y = ABUND, X = X, nX = ncol(X), n = nrow(loyn))) loyn.rstan.red <- stan(data = loyn.list, model_code = modelString, chains = 3, iter = 5000, warmup = 2500, thin = 3, save_dso = TRUE)
SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1). Gradient evaluation took 2.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.210041 seconds (Warm-up) 0.099843 seconds (Sampling) 0.309884 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2). Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.212176 seconds (Warm-up) 0.101142 seconds (Sampling) 0.313318 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3). Gradient evaluation took 1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.201244 seconds (Warm-up) 0.098536 seconds (Sampling) 0.29978 seconds (Total)
(reduced = loo(extract_log_lik(loyn.rstan.red)))
Computed from 2502 by 56 log-likelihood matrix Estimate SE elpd_loo -189.5 6.3 p_loo 7.5 1.7 looic 378.9 12.5 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 54 96.4% (0.5, 0.7] (ok) 2 3.6% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
summary(loyn.rstanarm)
Model Info: function: stan_glm family: gaussian [identity] formula: ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL algorithm: sampling priors: see help('prior_summary') sample: 6750 (posterior sample size) num obs: 56 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 19.5 0.9 17.8 18.9 19.5 20.1 21.3 cDIST -0.9 2.7 -6.3 -2.8 -1.0 0.9 4.4 cLDIST -0.6 2.2 -5.0 -2.0 -0.6 0.9 3.7 cAREA 7.5 1.5 4.6 6.5 7.5 8.5 10.4 cGRAZE -1.7 1.0 -3.5 -2.3 -1.7 -1.0 0.3 cALT 0.0 0.0 0.0 0.0 0.0 0.0 0.1 cYR.ISOL 0.1 0.0 0.0 0.0 0.1 0.1 0.2 sigma 6.5 0.7 5.4 6.1 6.5 6.9 8.0 mean_PPD 19.5 1.2 17.1 18.6 19.5 20.3 21.9 log-posterior -198.3 2.1 -203.4 -199.5 -198.0 -196.7 -195.1 Diagnostics: mcse Rhat n_eff (Intercept) 0.0 1.0 6455 cDIST 0.0 1.0 6241 cLDIST 0.0 1.0 6750 cAREA 0.0 1.0 6364 cGRAZE 0.0 1.0 5911 cALT 0.0 1.0 6505 cYR.ISOL 0.0 1.0 6436 sigma 0.0 1.0 6137 mean_PPD 0.0 1.0 6412 log-posterior 0.0 1.0 4445 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
library(broom) tidyMCMC(loyn.rstanarm$stanfit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 (Intercept) 19.51470858 0.87397606 17.80774674 21.28378216 0.9999715 6455 2 cDIST -0.93894131 2.74304984 -6.21251435 4.50342466 1.0001898 6241 3 cLDIST -0.61542432 2.18690279 -5.04889534 3.58168450 0.9999507 6750 4 cAREA 7.47880465 1.48748763 4.51755530 10.30705743 1.0003404 6364 5 cGRAZE -1.66200918 0.95323779 -3.54291988 0.15651234 0.9998401 5911 6 cALT 0.01927526 0.02446506 -0.02782742 0.06755766 1.0002757 6505 7 cYR.ISOL 0.07368280 0.04607753 -0.01021086 0.16906692 1.0001234 6436 8 sigma 6.53502840 0.68217042 5.26894439 7.88647953 0.9999148 6137 9 mean_PPD 19.48796549 1.24135073 17.19598615 21.98422036 0.9999042 6412 10 log-posterior -198.30715262 2.14292966 -202.55736971 -194.83677821 0.9999104 4445
mcmcpvalue(as.matrix(loyn.rstanarm)[, "cDIST"])
[1] 0.7238519
mcmcpvalue(as.matrix(loyn.rstanarm)[, "cLDIST"])
[1] 0.7751111
mcmcpvalue(as.matrix(loyn.rstanarm)[, "cAREA"])
[1] 0
mcmcpvalue(as.matrix(loyn.rstanarm)[, "cGRAZE"])
[1] 0.08014815
mcmcpvalue(as.matrix(loyn.rstanarm)[, "cALT"])
[1] 0.4260741
mcmcpvalue(as.matrix(loyn.rstanarm)[, "cYR.ISOL"])
[1] 0.1056296
# lets explore the support for GRAZE via loo library(loo) (full = loo(loyn.rstanarm))
Computed from 6750 by 56 log-likelihood matrix Estimate SE elpd_loo -188.6 6.1 p_loo 8.0 1.8 looic 377.3 12.2 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 55 98.2% (0.5, 0.7] (ok) 1 1.8% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
loyn.rstanarm.red <- update(loyn.rstanarm, . ~ . - cGRAZE)
Gradient evaluation took 2.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.26 seconds. Adjust your expectations accordingly! Elapsed Time: 0.102453 seconds (Warm-up) 0.362637 seconds (Sampling) 0.46509 seconds (Total) Gradient evaluation took 2.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds. Adjust your expectations accordingly! Elapsed Time: 0.079963 seconds (Warm-up) 0.342621 seconds (Sampling) 0.422584 seconds (Total) Gradient evaluation took 1.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. Adjust your expectations accordingly! Elapsed Time: 0.076559 seconds (Warm-up) 0.326624 seconds (Sampling) 0.403183 seconds (Total)
(reduced = loo(loyn.rstanarm.red))
Computed from 6750 by 56 log-likelihood matrix Estimate SE elpd_loo -189.6 6.2 p_loo 7.6 1.7 looic 379.1 12.4 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 55 98.2% (0.5, 0.7] (ok) 1 1.8% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
summary(loyn.brm)
Family: gaussian(identity) Formula: ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL Data: loyn (Number of observations: 56) Samples: 3 chains, each with iter = 5000; warmup = 500; thin = 2; total post-warmup samples = 6750 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 19.38 0.87 17.61 21.09 6062 1 cDIST -0.84 2.59 -5.84 4.26 6718 1 cLDIST -0.56 2.08 -4.72 3.47 6305 1 cAREA 7.30 1.47 4.46 10.16 6451 1 cGRAZE -1.70 0.94 -3.55 0.14 5912 1 cALT 0.02 0.02 -0.03 0.07 6540 1 cYR.ISOL 0.07 0.05 -0.02 0.16 6378 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.47 0.66 5.34 7.92 5867 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
library(broom) tidyMCMC(loyn.brm$fit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 b_Intercept 19.37880736 0.87350422 17.59486668 21.04523042 1.0002504 6062 2 b_cDIST -0.84118727 2.59215760 -5.93414106 4.14410963 0.9997645 6718 3 b_cLDIST -0.55827202 2.08479070 -4.75640558 3.42477995 1.0006374 6305 4 b_cAREA 7.30182513 1.46762773 4.44967625 10.14571910 1.0001524 6451 5 b_cGRAZE -1.69718041 0.93533654 -3.56234009 0.11421872 0.9997029 5912 6 b_cALT 0.02050336 0.02460275 -0.02782871 0.06897146 0.9996017 6540 7 b_cYR.ISOL 0.07385555 0.04541507 -0.01317674 0.16516260 1.0001454 6378 8 sigma 6.47151516 0.65910492 5.29794394 7.82266145 0.9998090 5867
mcmcpvalue(as.matrix(loyn.brm)[, "b_cDIST"])
[1] 0.7465185
mcmcpvalue(as.matrix(loyn.brm)[, "b_cLDIST"])
[1] 0.7779259
mcmcpvalue(as.matrix(loyn.brm)[, "b_cAREA"])
[1] 0
mcmcpvalue(as.matrix(loyn.brm)[, "b_cGRAZE"])
[1] 0.07037037
mcmcpvalue(as.matrix(loyn.brm)[, "b_cALT"])
[1] 0.3955556
mcmcpvalue(as.matrix(loyn.brm)[, "b_cYR.ISOL"])
[1] 0.1042963
# lets explore the support for GRAZE via loo library(loo) (full = loo(loyn.brm))
LOOIC SE 376.84 12.38
loyn.brm.red <- update(loyn.brm, . ~ . - cGRAZE)
SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1). Gradient evaluation took 1.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.194386 seconds (Warm-up) 0.098192 seconds (Sampling) 0.292578 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2). Gradient evaluation took 8e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.17107 seconds (Warm-up) 0.097613 seconds (Sampling) 0.268683 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3). Gradient evaluation took 1.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.176611 seconds (Warm-up) 0.099638 seconds (Sampling) 0.276249 seconds (Total)
(reduced = loo(loyn.brm.red))
LOOIC SE 379.13 12.66
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
There is not much (if any) support for GRAZE. We will explore this more thoroughly when we look at sparsity.
- Generate graphical summaries
library(MCMCpack) loyn.mcmc = loyn.mcmcpack ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
library(MCMCpack) loyn.mcmc = loyn.mcmcpack ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
loyn.mcmc = as.matrix(loyn.rstan) ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
loyn.mcmc = as.matrix(loyn.rstan) ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
loyn.mcmc = as.matrix(loyn.rstanarm) ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
loyn.mcmc = as.matrix(loyn.rstanarm) ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
plot(marginal_effects(loyn.brm), points = TRUE)
loyn.mcmc = as.matrix(loyn.brm) ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("b_Intercept)", "b_cDIST", "b_cLDIST", "b_cAREA", "b_cGRAZE", "b_cALT", "b_cYR.ISOL")]
Error in loyn.mcmc[, c("b_Intercept)", "b_cDIST", "b_cLDIST", "b_cAREA", : subscript out of bounds
fit = coefs %*% t(Xmat) newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
loyn.mcmc = as.matrix(loyn.brm) ## Calculate the fitted values Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL") loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars library(newdata) newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]], seq = x, len = 100))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("b_Intercept", "b_cDIST", "b_cLDIST", "b_cAREA", "b_cGRAZE", "b_cALT", "b_cYR.ISOL")] fit = coefs %*% t(Xmat) newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ## Partial residuals loyn.list = rep(list(loyn), length(Vars)) names(loyn.list) <- Vars rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>% mutate_at(Vars[!Vars %in% x], mean))) fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat))) rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0) ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) + geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") + facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() + theme(strip.background = element_blank(), strip.placement = "outside")
- Explore effect sizes - change in Abundance associated with a change equivalent to increasing from the 20th to 80th percentile of
each predictor holding the other predictors constant.
library(MCMCpack) loyn.mcmc = loyn.mcmcpack newdata = with(loyn, rbind(data.frame(cDIST = log10(quantile(DIST, p = c(0.2, 0.8))) - log10(mean.DIST), cLDIST = 0, cAREA = 0, cGRAZE = 0, cALT = 0, cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = log10(quantile(LDIST, p = c(0.2, 0.8))) - log10(mean.LDIST), cAREA = 0, cGRAZE = 0, cALT = 0, cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = 0, cAREA = log10(quantile(AREA, p = c(0.2, 0.8))) - log10(mean.AREA), cGRAZE = 0, cALT = 0, cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = 0, cAREA = 0, cGRAZE = quantile(GRAZE, p = c(0.2, 0.8)) - mean.GRAZE, cALT = 0, cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = 0, cAREA = 0, cGRAZE = 0, cALT = quantile(ALT, p = c(0.2, 0.8)) - mean.ALT, cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = 0, cAREA = 0, cGRAZE = 0, cALT = 0, cYR.ISOL = quantile(YR.ISOL, p = c(0.2, 0.8)) - mean.YR.ISOL))) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = (coefs %*% t(Xmat)) s1 = seq(1, 12, b = 2) s2 = seq(2, 12, b = 2) ## Raw effect size RES = fit[, s2] - fit[, s1] colnames(RES) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL") mcmc_intervals(as.mcmc(RES))
(RES = tidyMCMC(as.mcmc(RES), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 DIST -0.6653429 1.988964 -4.6824338 3.1376717 2 LDIST -0.6136897 2.158422 -4.8168520 3.6138394 3 AREA 9.6948155 1.944046 5.8300721 13.4774364 4 GRAZE -6.7121797 3.817873 -14.3005012 0.5568621 5 ALT 1.5696924 1.950196 -2.1571331 5.4669455 6 YR.ISOL 3.4543288 2.204452 -0.9953545 7.7192585
## Cohen's D cohenD = (fit[, s2] - fit[, s1])/sqrt(loyn.mcmc[, "sigma2"]) colnames(cohenD) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL") (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 DIST -0.10369055 0.3047738 -0.7095491 0.47470403 2 LDIST -0.09515195 0.3312163 -0.7532847 0.53591205 3 AREA 1.50909802 0.3342992 0.8440978 2.14860590 4 GRAZE -1.04398623 0.5932122 -2.2274001 0.08168777 5 ALT 0.24466323 0.3007082 -0.3211053 0.85235582 6 YR.ISOL 0.53775976 0.3411211 -0.1620817 1.18748656
# Percentage change ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1] colnames(ESp) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL") mcmc_intervals(as.mcmc(ESp))
(PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 DIST -6.960431 17.227980 -36.350869 15.480668 2 LDIST -6.347939 16.321089 -35.412372 17.405606 3 AREA 43.964662 8.040105 28.102975 59.620387 4 GRAZE -28.207372 14.782815 -56.032146 0.454167 5 ALT 8.877497 10.924913 -11.728873 30.844127 6 YR.ISOL 21.144322 14.796056 -6.154603 51.715683
# Probability that the effect is greater than various percentages (p0 = apply(ESp, 2, function(x, f = 0) ifelse(mean(x) > 0, sum(x > f)/length(x), sum(-1 * x > f)/length(x))))
DIST LDIST AREA GRAZE ALT YR.ISOL 0.6321 0.6121 1.0000 0.9623 0.7931 0.9414
(p5 = apply(ESp, 2, function(x, f = 5) ifelse(mean(x) > 0, sum(x > f)/length(x), sum(-1 * x > f)/length(x))))
DIST LDIST AREA GRAZE ALT YR.ISOL 0.4529 0.4400 1.0000 0.9325 0.6244 0.8760
(p10 = apply(ESp, 2, function(x, f = 10) ifelse(mean(x) > 0, sum(x > f)/length(x), sum(-1 * x > f)/length(x))))
DIST LDIST AREA GRAZE ALT YR.ISOL 0.3096 0.3089 1.0000 0.8875 0.4376 0.7773
(p20 = apply(ESp, 2, function(x, f = 20) ifelse(mean(x) > 0, sum(x > f)/length(x), sum(-1 * x > f)/length(x))))
DIST LDIST AREA GRAZE ALT YR.ISOL 0.1451 0.1449 0.9972 0.7366 0.1511 0.4961
(p50 = apply(ESp, 2, function(x, f = 50) ifelse(mean(x) > 0, sum(x > f)/length(x), sum(-1 * x > f)/length(x))))
DIST LDIST AREA GRAZE ALT YR.ISOL 0.0187 0.0184 0.2223 0.0516 0.0008 0.0383
## fractional change FES = fit[, s2]/fit[, s1] colnames(FES) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL") (FES = tidyMCMC(as.mcmc(FES), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 DIST 0.9303957 0.17227980 0.6364913 1.154807 2 LDIST 0.9365206 0.16321089 0.6458763 1.174056 3 AREA 1.4396466 0.08040105 1.2810298 1.596204 4 GRAZE 0.7179263 0.14782815 0.4396785 1.004542 5 ALT 1.0887750 0.10924913 0.8827113 1.308441 6 YR.ISOL 1.2114432 0.14796056 0.9384540 1.517157
- Explore finite-population standard deviations
library(MCMCpack) library(broom) loyn.mcmc = loyn.mcmcpack Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) sd.DIST = abs(loyn.mcmc[, "cDIST"]) * sd(Xmat[, "cDIST"]) sd.LDIST = abs(loyn.mcmc[, "cLDIST"]) * sd(Xmat[, "cLDIST"]) sd.AREA = abs(loyn.mcmc[, "cAREA"]) * sd(Xmat[, "cAREA"]) sd.GRAZE = abs(loyn.mcmc[, "cGRAZE"]) * sd(Xmat[, "cGRAZE"]) sd.ALT = abs(loyn.mcmc[, "cALT"]) * sd(Xmat[, "cALT"]) sd.YR.ISOL = abs(loyn.mcmc[, "cYR.ISOL"]) * sd(Xmat[, "cYR.ISOL"]) sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL, sd.resid) mcmc_intervals(sd.all)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 0.9516963 0.7273438 1.758422e-04 2.348979 2 sd.LDIST 1.0318149 0.7940909 3.821824e-04 2.560397 3 sd.AREA 6.0538065 1.2139350 3.640516e+00 8.415817 4 sd.GRAZE 2.5133971 1.3210367 1.124092e-03 4.760110 5 sd.ALT 1.1069382 0.7937846 1.267090e-06 2.609109 6 sd.YR.ISOL 1.9459572 1.0898622 2.102330e-04 3.873274 7 sd.resid 6.3977840 0.2256042 6.058299e+00 6.840074
# OR expressed as a percentage mcmc_intervals(100 * sd.all/rowSums(sd.all))
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 4.040253 3.296537 0.003643070 10.88095 2 sd.LDIST 4.389845 3.575600 0.001932600 11.79132 3 sd.AREA 30.594181 5.619669 18.862557405 40.73962 4 sd.GRAZE 12.496277 6.574856 0.005641009 23.69286 5 sd.ALT 4.829754 3.866615 0.001394094 12.91534 6 sd.YR.ISOL 9.517571 5.286090 0.001119749 18.77406 7 sd.resid 32.019220 2.571167 27.477591094 37.42906
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) sd.DIST = abs(loyn.mcmc[, "beta[1]"]) * sd(Xmat[, "cDIST"]) sd.LDIST = abs(loyn.mcmc[, "beta[2]"]) * sd(Xmat[, "cLDIST"]) sd.AREA = abs(loyn.mcmc[, "beta[3]"]) * sd(Xmat[, "cAREA"]) sd.GRAZE = abs(loyn.mcmc[, "beta[4]"]) * sd(Xmat[, "cGRAZE"]) sd.ALT = abs(loyn.mcmc[, "beta[5]"]) * sd(Xmat[, "cALT"]) sd.YR.ISOL = abs(loyn.mcmc[, "beta[6]"]) * sd(Xmat[, "cYR.ISOL"]) sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL, sd.resid) mcmc_intervals(sd.all)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 0.9596003 0.7339694 6.403065e-07 2.370494 2 sd.LDIST 1.0462352 0.7945808 5.629702e-06 2.568776 3 sd.AREA 6.0685802 1.2261081 3.648984e+00 8.491435 4 sd.GRAZE 2.5039995 1.3252001 6.243016e-03 4.765300 5 sd.ALT 1.1249923 0.8074863 2.584672e-05 2.656941 6 sd.YR.ISOL 1.9570094 1.0987181 2.994709e-03 3.864309 7 sd.resid 6.4058753 0.2282916 6.057727e+00 6.852572
# OR expressed as a percentage mcmc_intervals(100 * sd.all/rowSums(sd.all))
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 4.051554 3.319205 3.635510e-06 10.92768 2 sd.LDIST 4.451107 3.578625 3.084334e-05 11.83866 3 sd.AREA 30.556324 5.647921 1.910566e+01 41.03840 4 sd.GRAZE 12.324617 6.590258 2.584667e-02 23.67757 5 sd.ALT 4.916221 3.909358 8.387666e-04 12.93600 6 sd.YR.ISOL 9.535497 5.318853 2.178405e-02 18.74905 7 sd.resid 31.931299 2.560712 2.734591e+01 37.31862
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
loyn.mcmc = as.matrix(loyn.rstan) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) sd.DIST = abs(loyn.mcmc[, "beta[1]"]) * sd(Xmat[, "cDIST"]) sd.LDIST = abs(loyn.mcmc[, "beta[2]"]) * sd(Xmat[, "cLDIST"]) sd.AREA = abs(loyn.mcmc[, "beta[3]"]) * sd(Xmat[, "cAREA"]) sd.GRAZE = abs(loyn.mcmc[, "beta[4]"]) * sd(Xmat[, "cGRAZE"]) sd.ALT = abs(loyn.mcmc[, "beta[5]"]) * sd(Xmat[, "cALT"]) sd.YR.ISOL = abs(loyn.mcmc[, "beta[6]"]) * sd(Xmat[, "cYR.ISOL"]) sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL, sd.resid) mcmc_intervals(sd.all)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 0.9148496 0.6983213 8.785653e-04 2.256061 2 sd.LDIST 0.9907171 0.7587319 3.052998e-05 2.438256 3 sd.AREA 5.9587552 1.1909187 3.561265e+00 8.289984 4 sd.GRAZE 2.5103440 1.2965350 2.970500e-02 4.676791 5 sd.ALT 1.1171491 0.8069985 3.085684e-05 2.650259 6 sd.YR.ISOL 1.9678262 1.0938962 5.873276e-04 3.872703 7 sd.resid 6.3890485 0.2190489 6.054454e+00 6.823333
# OR expressed as a percentage mcmc_intervals(100 * sd.all/rowSums(sd.all))
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 3.907519 3.212341 4.097890e-03 10.56099 2 sd.LDIST 4.273942 3.452200 1.731688e-04 11.44000 3 sd.AREA 30.236970 5.580443 1.926755e+01 40.74984 4 sd.GRAZE 12.657728 6.582046 7.654405e-02 23.72766 5 sd.ALT 4.918719 3.946802 1.728940e-04 13.04556 6 sd.YR.ISOL 9.710659 5.318334 8.498579e-03 18.89806 7 sd.resid 32.163828 2.610195 2.776468e+01 37.80383
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
loyn.mcmc = as.matrix(loyn.rstanarm) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) sd.DIST = abs(loyn.mcmc[, "cDIST"]) * sd(Xmat[, "cDIST"]) sd.LDIST = abs(loyn.mcmc[, "cLDIST"]) * sd(Xmat[, "cLDIST"]) sd.AREA = abs(loyn.mcmc[, "cAREA"]) * sd(Xmat[, "cAREA"]) sd.GRAZE = abs(loyn.mcmc[, "cGRAZE"]) * sd(Xmat[, "cGRAZE"]) sd.ALT = abs(loyn.mcmc[, "cALT"]) * sd(Xmat[, "cALT"]) sd.YR.ISOL = abs(loyn.mcmc[, "cYR.ISOL"]) * sd(Xmat[, "cYR.ISOL"]) sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL, sd.resid) mcmc_intervals(sd.all)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 0.9571352 0.7232895 2.055808e-04 2.354083 2 sd.LDIST 1.0359987 0.7951729 8.080357e-05 2.614886 3 sd.AREA 6.0758702 1.2084527 3.670116e+00 8.373577 4 sd.GRAZE 2.4987310 1.3029521 2.308358e-02 4.733729 5 sd.ALT 1.1035495 0.7875110 5.217236e-04 2.587163 6 sd.YR.ISOL 1.9408230 1.0841879 5.716389e-04 3.812901 7 sd.resid 6.4001331 0.2267968 6.059646e+00 6.842554
# OR expressed as a percentage mcmc_intervals(100 * sd.all/rowSums(sd.all))
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 4.069556 3.279479 1.849442e-03 10.96721 2 sd.LDIST 4.405925 3.579056 4.602495e-04 11.99038 3 sd.AREA 30.607918 5.599275 1.952106e+01 41.27510 4 sd.GRAZE 12.275375 6.481209 1.274328e-01 23.66851 5 sd.ALT 4.895469 3.812649 8.188199e-04 12.55481 6 sd.YR.ISOL 9.483868 5.262687 6.561997e-02 18.64995 7 sd.resid 31.999916 2.553375 2.742948e+01 37.42199
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
loyn.mcmc = as.matrix(loyn.brm) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) sd.DIST = abs(loyn.mcmc[, "b_cDIST"]) * sd(Xmat[, "cDIST"]) sd.LDIST = abs(loyn.mcmc[, "b_cLDIST"]) * sd(Xmat[, "cLDIST"]) sd.AREA = abs(loyn.mcmc[, "b_cAREA"]) * sd(Xmat[, "cAREA"]) sd.GRAZE = abs(loyn.mcmc[, "b_cGRAZE"]) * sd(Xmat[, "cGRAZE"]) sd.ALT = abs(loyn.mcmc[, "b_cALT"]) * sd(Xmat[, "cALT"]) sd.YR.ISOL = abs(loyn.mcmc[, "b_cYR.ISOL"]) * sd(Xmat[, "cYR.ISOL"]) sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL # generate a model matrix newdata = loyn Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata) ## get median parameter estimates coefs = loyn.mcmc[, c("b_Intercept", "b_cDIST", "b_cLDIST", "b_cAREA", "b_cGRAZE", "b_cALT", "b_cYR.ISOL")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL, sd.resid) mcmc_intervals(sd.all)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 0.8991315 0.6805757 4.123211e-04 2.214328 2 sd.LDIST 0.9788072 0.7623775 2.009903e-04 2.452555 3 sd.AREA 5.9320899 1.1923183 3.614970e+00 8.242503 4 sd.GRAZE 2.5338467 1.3051533 2.385887e-03 4.772074 5 sd.ALT 1.1308774 0.8151568 3.568906e-05 2.674572 6 sd.YR.ISOL 1.9432391 1.0691568 3.139418e-04 3.786755 7 sd.resid 6.3850130 0.2137011 6.066638e+00 6.811799
# OR expressed as a percentage mcmc_intervals(100 * sd.all/rowSums(sd.all))
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.DIST 3.876120 3.148269 4.860050e-03 10.31386 2 sd.LDIST 4.179393 3.472956 1.031670e-03 11.37728 3 sd.AREA 30.279843 5.607016 1.885392e+01 40.38044 4 sd.GRAZE 12.742637 6.568014 1.273447e-02 24.05760 5 sd.ALT 5.035547 4.001749 1.836134e-04 13.09358 6 sd.YR.ISOL 9.607205 5.239434 1.359604e-02 18.65769 7 sd.resid 32.247312 2.542148 2.769044e+01 37.58520
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
- Explore $R^2$
library(MCMCpack) library(broom) loyn.mcmc <- loyn.mcmcpack Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.665933 0.04516778 0.5803855 0.7428253
# for comparison with frequentist summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn))
Call: lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn) Residuals: Min 1Q Median 3Q Max -15.6506 -2.9390 0.5289 2.5353 15.2842 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -125.69725 91.69228 -1.371 0.1767 log10(DIST) -0.90696 2.67572 -0.339 0.7361 log10(LDIST) -0.64842 2.12270 -0.305 0.7613 log10(AREA) 7.47023 1.46489 5.099 5.49e-06 *** GRAZE -1.66774 0.92993 -1.793 0.0791 . ALT 0.01951 0.02396 0.814 0.4195 YR.ISOL 0.07387 0.04520 1.634 0.1086 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.384 on 49 degrees of freedom Multiple R-squared: 0.6849, Adjusted R-squared: 0.6464 F-statistic: 17.75 on 6 and 49 DF, p-value: 8.443e-11
loyn.mcmc <- loyn.r2jags$BUGSoutput$sims.matrix Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.6659416 0.04498095 0.5786503 0.7411309
# for comparison with frequentist summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn))
Call: lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn) Residuals: Min 1Q Median 3Q Max -15.6506 -2.9390 0.5289 2.5353 15.2842 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -125.69725 91.69228 -1.371 0.1767 log10(DIST) -0.90696 2.67572 -0.339 0.7361 log10(LDIST) -0.64842 2.12270 -0.305 0.7613 log10(AREA) 7.47023 1.46489 5.099 5.49e-06 *** GRAZE -1.66774 0.92993 -1.793 0.0791 . ALT 0.01951 0.02396 0.814 0.4195 YR.ISOL 0.07387 0.04520 1.634 0.1086 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.384 on 49 degrees of freedom Multiple R-squared: 0.6849, Adjusted R-squared: 0.6464 F-statistic: 17.75 on 6 and 49 DF, p-value: 8.443e-11
loyn.mcmc <- as.matrix(loyn.rstan) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.6642038 0.04535031 0.575571 0.7392861
# for comparison with frequentist summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn))
Call: lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn) Residuals: Min 1Q Median 3Q Max -15.6506 -2.9390 0.5289 2.5353 15.2842 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -125.69725 91.69228 -1.371 0.1767 log10(DIST) -0.90696 2.67572 -0.339 0.7361 log10(LDIST) -0.64842 2.12270 -0.305 0.7613 log10(AREA) 7.47023 1.46489 5.099 5.49e-06 *** GRAZE -1.66774 0.92993 -1.793 0.0791 . ALT 0.01951 0.02396 0.814 0.4195 YR.ISOL 0.07387 0.04520 1.634 0.1086 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.384 on 49 degrees of freedom Multiple R-squared: 0.6849, Adjusted R-squared: 0.6464 F-statistic: 17.75 on 6 and 49 DF, p-value: 8.443e-11
loyn.mcmc <- as.matrix(loyn.rstanarm) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.6657499 0.04470819 0.5755614 0.7379351
# for comparison with frequentist summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn))
Call: lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn) Residuals: Min 1Q Median 3Q Max -15.6506 -2.9390 0.5289 2.5353 15.2842 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -125.69725 91.69228 -1.371 0.1767 log10(DIST) -0.90696 2.67572 -0.339 0.7361 log10(LDIST) -0.64842 2.12270 -0.305 0.7613 log10(AREA) 7.47023 1.46489 5.099 5.49e-06 *** GRAZE -1.66774 0.92993 -1.793 0.0791 . ALT 0.01951 0.02396 0.814 0.4195 YR.ISOL 0.07387 0.04520 1.634 0.1086 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.384 on 49 degrees of freedom Multiple R-squared: 0.6849, Adjusted R-squared: 0.6464 F-statistic: 17.75 on 6 and 49 DF, p-value: 8.443e-11
loyn.mcmc <- as.matrix(loyn.brm) Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) coefs = loyn.mcmc[, c("b_Intercept", "b_cDIST", "b_cLDIST", "b_cAREA", "b_cGRAZE", "b_cALT", "b_cYR.ISOL")] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, loyn$ABUND, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.6639469 0.04520351 0.5769229 0.7408735
# for comparison with frequentist summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn))
Call: lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn) Residuals: Min 1Q Median 3Q Max -15.6506 -2.9390 0.5289 2.5353 15.2842 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -125.69725 91.69228 -1.371 0.1767 log10(DIST) -0.90696 2.67572 -0.339 0.7361 log10(LDIST) -0.64842 2.12270 -0.305 0.7613 log10(AREA) 7.47023 1.46489 5.099 5.49e-06 *** GRAZE -1.66774 0.92993 -1.793 0.0791 . ALT 0.01951 0.02396 0.814 0.4195 YR.ISOL 0.07387 0.04520 1.634 0.1086 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.384 on 49 degrees of freedom Multiple R-squared: 0.6849, Adjusted R-squared: 0.6464 F-statistic: 17.75 on 6 and 49 DF, p-value: 8.443e-11
- We might expect that some of the predictors have no effect, so we could explore sparsity.
modelString=" data { int < lower =0 > n; # number of observations int < lower =0 > nX; # number of predictors vector [ n] Y; # outputs matrix [n ,nX] X; # inputs real < lower =0 > scale_icept ; # prior std for the intercept real < lower =0 > scale_global ; # scale for the half -t prior for tau real < lower =1 > nu_global ; # degrees of freedom for the half -t priors for tau real < lower =1 > nu_local ; # degrees of freedom for the half - t priors for lambdas real < lower =0 > slab_scale ; # slab scale for the regularized horseshoe real < lower =0 > slab_df ; # slab degrees of freedom for the regularized horseshoe } transformed data { matrix[n, nX - 1] Xc; // centered version of X vector[nX - 1] means_X; // column means of X before centering for (i in 2:nX) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { real logsigma ; real cbeta0 ; vector [ nX-1] z; real < lower =0 > tau ; # global shrinkage parameter vector < lower =0 >[ nX-1] lambda ; # local shrinkage parameter real < lower =0 > caux ; } transformed parameters { real < lower =0 > sigma ; # noise std vector < lower =0 >[ nX-1] lambda_tilde ; # ’ truncated ’ local shrinkage parameter real < lower =0 > c; # slab scale vector [ nX-1] beta ; # regression coefficients vector [ n] mu; # latent function values sigma = exp ( logsigma ); c = slab_scale * sqrt ( caux ); lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) ); beta = z .* lambda_tilde * tau ; mu = cbeta0 + Xc* beta ; } model { # half -t priors for lambdas and tau , and inverse - gamma for c ^2 z ~ normal (0 , 1); lambda ~ student_t ( nu_local , 0, 1); tau ~ student_t ( nu_global , 0 , scale_global * sigma ); caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df ); cbeta0 ~ normal (0 , scale_icept ); Y ~ normal (mu , sigma ); } generated quantities { real beta0; // population-level intercept vector[n] log_lik; beta0 = cbeta0 - dot_product(means_X, beta); for (i in 1:n) { log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma); } }"
X = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn) loyn.list <- with(loyn, list(Y = ABUND, X = X, nX = ncol(X), n = nrow(loyn), scale_icept = 100, scale_global = 1, nu_global = 1, nu_local = 1, slab_scale = 2, slab_df = 4)) loyn.rstan.sparsity <- stan(data = loyn.list, model_code = modelString, chains = 3, iter = 5000, warmup = 2500, thin = 3, save_dso = TRUE)
SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 1). Gradient evaluation took 3.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.32 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 1.62127 seconds (Warm-up) 1.88662 seconds (Sampling) 3.50789 seconds (Total) SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 2). Gradient evaluation took 1.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 1.50273 seconds (Warm-up) 2.34926 seconds (Sampling) 3.85199 seconds (Total) SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 3). Gradient evaluation took 1.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 1000 / 5000 [ 20%] (Warmup) Iteration: 1500 / 5000 [ 30%] (Warmup) Iteration: 2000 / 5000 [ 40%] (Warmup) Iteration: 2500 / 5000 [ 50%] (Warmup) Iteration: 2501 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 2.34557 seconds (Warm-up) 0.915766 seconds (Sampling) 3.26134 seconds (Total)
tidyMCMC(loyn.rstan.sparsity, pars = c("beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]"), conf.int = TRUE, conf.type = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 beta[1] -0.102450119 1.34718829 -3.26288544 2.53029434 1.003982 1076 2 beta[2] 0.005498916 1.13020409 -2.38645159 2.53589963 1.000971 1809 3 beta[3] 6.015072135 1.49222583 3.00689451 8.89352035 1.000410 912 4 beta[4] -1.627923126 0.98545534 -3.56946777 0.05998661 1.005908 248 5 beta[5] 0.026963370 0.02298719 -0.01398106 0.07309717 1.000758 1800 6 beta[6] 0.083831429 0.04710366 -0.00398308 0.17670472 1.003063 1396
library(bayesplot) mcmc_areas(as.matrix(loyn.rstan.sparsity), pars = c("beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]"))
n = nrow(loyn) +\n X = 2 p0 = 1 global_scale = p0/(nX - p0)/sqrt(n) loyn.rstanarm.sparsity = stan_glm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn, iter = 5000, warmup = 2500, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = hs(df = 1, global_df = 1, global_scale = global_scale), prior_aux = cauchy(0, 2))
Gradient evaluation took 0.000103 seconds 1000 transitions using 10 leapfrog steps per transition would take 1.03 seconds. Adjust your expectations accordingly! Elapsed Time: 7.38882 seconds (Warm-up) 24.3185 seconds (Sampling) 31.7073 seconds (Total) Gradient evaluation took 1.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds. Adjust your expectations accordingly! Elapsed Time: 8.66478 seconds (Warm-up) 5.28659 seconds (Sampling) 13.9514 seconds (Total) Gradient evaluation took 2.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds. Adjust your expectations accordingly! Elapsed Time: 6.80974 seconds (Warm-up) 4.8305 seconds (Sampling) 11.6402 seconds (Total)
print(loyn.rstanarm.sparsity)
stan_glm family: gaussian [identity] formula: ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL ------ Estimates: Median MAD_SD (Intercept) 19.6 0.9 cDIST 0.0 0.3 cLDIST 0.0 0.3 cAREA 7.4 1.4 cGRAZE -0.7 1.0 cALT 0.0 0.0 cYR.ISOL 0.1 0.0 sigma 6.4 0.6 Sample avg. posterior predictive distribution of y (X = xbar): Median MAD_SD mean_PPD 19.6 1.2 ------ For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(loyn.rstanarm.sparsity$stanfit, conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 (Intercept) 19.53351641 0.86432405 1.788227e+01 21.26985950 1.0001766 3542 2 cDIST -0.13838979 0.95322116 -2.455377e+00 1.73041549 0.9997868 3271 3 cLDIST -0.13903438 0.82075933 -2.276093e+00 1.38848409 1.0005991 3360 4 cAREA 7.40356992 1.40493339 4.760695e+00 10.27762748 0.9999890 3140 5 cGRAZE -0.90631856 0.93853532 -2.858119e+00 0.38289343 0.9993849 2692 6 cALT 0.02593069 0.02211978 -1.382799e-02 0.06813137 1.0000430 3138 7 cYR.ISOL 0.09427198 0.04620134 -4.486496e-04 0.17767418 0.9996034 3331 8 sigma 6.48209753 0.64324825 5.333988e+00 7.81244883 1.0006806 3274 9 mean_PPD 19.55320518 1.21915903 1.717871e+01 21.98696666 0.9996938 3578 10 log-posterior -232.08622059 4.07046967 -2.399648e+02 -224.28703690 1.0007980 1745
library(bayesplot) mcmc_areas(as.matrix(loyn.rstanarm.sparsity), regex_par = "^c")
n = nrow(loyn) +\n X = 2 p0 = 1 global_scale = p0/(nX - p0)/sqrt(n) loyn.brms.sparsity = brm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn, iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), prior(horseshoe(df = 1, par_ratio = par_ratio), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 3.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds. Adjust your expectations accordingly! Elapsed Time: 0.220903 seconds (Warm-up) 0.853136 seconds (Sampling) 1.07404 seconds (Total) Gradient evaluation took 2.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds. Adjust your expectations accordingly! Elapsed Time: 0.302515 seconds (Warm-up) 1.02843 seconds (Sampling) 1.33094 seconds (Total) Gradient evaluation took 2.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds. Adjust your expectations accordingly! Elapsed Time: 0.236384 seconds (Warm-up) 1.42325 seconds (Sampling) 1.65963 seconds (Total)
print(loyn.brms.sparsity)
Family: gaussian(identity) Formula: ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL Data: loyn (Number of observations: 56) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 19.52 0.82 17.88 21.19 1754 1.00 cDIST -0.07 0.96 -2.50 1.92 2079 1.00 cLDIST -0.05 0.83 -1.87 1.77 2173 1.00 cAREA 6.45 1.47 3.52 9.21 1560 1.01 cGRAZE -1.28 1.03 -3.31 0.18 315 1.01 cALT 0.03 0.02 -0.01 0.07 2094 1.00 cYR.ISOL 0.09 0.05 0.00 0.18 1197 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.41 0.66 5.3 7.85 506 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(loyn.brms.sparsity$fit, conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 b_Intercept 19.52226633 0.82160697 17.981431835 21.26754566 1.0017875 1754 2 b_cDIST -0.07125559 0.96185771 -2.513628278 1.90341363 0.9995230 2079 3 b_cLDIST -0.04643137 0.82792388 -1.978043084 1.63693893 0.9994019 2173 4 b_cAREA 6.45028569 1.46750185 3.505470412 9.20117108 1.0059270 1560 5 b_cGRAZE -1.28136463 1.03167979 -3.220460816 0.22999487 1.0056027 315 6 b_cALT 0.02799159 0.02226846 -0.011726251 0.07226876 0.9994930 2094 7 b_cYR.ISOL 0.08941109 0.04761456 -0.002998981 0.17472440 1.0024133 1197 8 sigma 6.40947392 0.66387866 5.187831810 7.67092283 1.0017656 506 9 hs_c2 5.54353010 8.79605100 0.372670519 17.86665059 1.0025060 1249
library(bayesplot) mcmc_areas(as.matrix(loyn.brms.sparsity), regex_par = "^b_c")
Abund (bird abundance) seems reasonably normal, however, the same cannot be said for AREA DIST and LDIST Try applying temporary logarithmic (base 10) transformations to these variables (HINT). Does this improve some of these specific assumptions (y or n)?
Despite the apparent correlation between DIST and LDIST, this does not appear to manifest into a statistical issue.