Bayesian tutorial: Correlation
The third of a series of tutorial posts on Bayesian analyses. In this post I focus on using brms to model a correlation.
- Published On
- Last Updated
In my previous blog post, I showed how to use brms and tidybayes to run a simple regression, i.e., a regression with a single predictor. This analysis required us to set three priors: an intercept prior, a sigma prior, and a slope prior. We can simplify this analysis by turning it into a correlational analysis. This will remove the intercept prior and lets us think about the prior for the slope as a standardized effect size, i.e., the correlation.
To run a correlational analysis we’ll need to standardize the outcome and predictor variable, so in the code below I run the setup code as usual and also standardize both variables that we’ll be correlating.
The formula for our model is slightly different compared to the formula
of the previous single-predictor model and that’s because we can omit
the intercept. By standardizing both the outcome and predictor
variables, the intercept is guarenteed to be 0. The regression line
always passes through the mean of the predictor and outcome variable.
The mean of both is 0 because of the standardization and the intercept
is the value the outcome takes when the predictor is 0. We could still
include a prior for the intercept and set it to 0 (using constant(0)
)
but we can also simply tell brms not to estimate it. The formula syntax
then becomes: height_z ~ 0 + weight_z
.
Let’s confirm that this means we only need to set two priors.
prior | class | coef | group | resp | dpar | nlpar | lb | ub | source |
---|---|---|---|---|---|---|---|---|---|
b | default | ||||||||
b | weight_z | default | |||||||
student_t(3, 0, 2.5) | sigma | 0 | default |
Indeed, we’re left with a prior for and one for
weight_z
, which we can specify either via class b
or the specific
coefficient for weight_z
.
Let’s also write down our model more explicitly, which is the same as the single predictor regression but without the intercept (). $$ heights_i ∼ Normal(\mu_i, \sigma)\\\mu_i = \beta x_i $$
Setting the priors
Let’s start with the prior for the slope (). A
correlation takes a value that ranges from -1 to 1. If you know
absolutely nothing about what kind of correlation to expect, you could
set a uniform prior that assign equal probability to every value from -1
to -1. Alternatively, we could use a prior that describes a belief that
no correlation is most likely, but with some probability that higher
correlations are possible too. This could be done with a normal
distribution centered around 0. In the case of this particular model, in
which height is regressed onto weight, we can probably expect a sizeable
positive correlation. So let’s use a skewed normal distribution that
puts most of the probability on a positive correlation but is wide
enough to allow for a range of correlations, including a negative one.
brms has the skew_normal()
function to specify a prior that’s a skewed
normal distribution. I fiddled around with the numbers a bit and the
distribution below is sort of what makes sense to me.
What should the prior for be? With the variables standardized, is limited to range from 0 to 1. If the predictor explains all the variance of the outcome variable, the residuals will be 0, meaning will be 0. If the predictor explains no variance, is equal to 1 because it will be similar to the standard deviation of the outcome variable, which is 1 because we’ve standardized it. Interestingly, this also means that the prior for is completely dependent on the prior for the slope, because the slope is what determines how much variance is explained in the outcome variable. I don’t know exactly how to deal with this dependency, except to fear it and make sure to carefully inspect the output so that we don’t have any problems due to incompatible priors. One way to avoid it entirely is to use a uniform prior that assign equal plausibility to each value between 0 and 1, so let’s do that.
Running the model
With the priors ready, we can run the model.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: height_z ~ 0 + weight_z
Data: data (Number of observations: 352)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
weight_z 0.74 0.03 0.68 0.81 1.00 3242 2432
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.66 0.03 0.61 0.71 1.00 3189 2683
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The output shows that the estimate for the slope, i.e., the correlation, is 0.74. This is just one number though. Let’s visualize the entire distribution, including the prior.
It looks like we can update towards a higher correlation and also be more certain about it because the range of the posterior is much narrower than that of our prior.
What about sigma? We saw that the correlation between the predictor and outcome is 0.74. Squaring this number gives us the amount of variance explained (0.55), so if we subtract this from 1 we’re left with the variance that is unexplained (0.45). Squaring this number to bring it back to a standard deviation gives us 0.67, which matches the estimate for sigma that we saw in the output of brms.
Using a regularizing prior
In the previous section we used a personal and hopefully informed prior, at least to some degree. What would happen if we instead used a generic weakly informative prior? This is a prior centered at 0 with a standard deviation of 1.
It’s a very broad prior and centered at 0. Does it being centered around 0 push the final estimate closer to a null effect? Let’s see by running the model.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: height_z ~ 0 + weight_z
Data: data (Number of observations: 352)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
weight_z 0.75 0.04 0.68 0.82 1.00 2699 2338
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.66 0.03 0.61 0.71 1.00 2980 2453
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The previous estimate of the correlation was 0.74 and now it’s 0.75. Apparently the prior did not influence the final estimate. This hopefully alleviates some worries about priors always having a strong impact on the final results and it also shows you don’t always need to carefully construct a prior. Of course, in certain cases the prior will have a strong influence, for example when the prior is very strong or when there isn’t much data. The prior we used here was broad enough so it didn’t exert a strong influence on the final estimates.
Multiple correlations
What if you want to test multiple correlations? There are two ways to do this, as far as I know. The first is to simply run separate models, each testing a single correlation. The second is by creating a model that tests multiple correlations at once.
One-by-one solution
Running multiple models to test each correlation is a bit of a chore,
but it’s made easier with the update()
function in brms, which makes
it so that you don’t have to write as much code. In the code below I
standardize age
in addition to the two columns we already standardized
and run three models in total, correlating height
with weight
,
height
with age
, and age
with weight
. I’ll use the same generic
prior from the last model and repeat the full code for that model,
followed by two updates.
Initially, the model correlating height with age produced a warning
about divergent transitions. brms produces a helpful warning message
with a link to more information about what exactly this means (in short,
it means the sampler thinks it’s off in estimating the posterior). The
message suggests we increase adapt_delta
above 0.8, so I adjusted the
code to set adapt_delta
to 0.9. Re-running the model got rid of the
warning messages.
The correlations are in the table below (the Estimate
column).
Pair | Estimate | Est.Error | Q2.5 | Q97.5 |
---|---|---|---|---|
height - weight | 0.7545178 | 0.0353659 | 0.6869373 | 0.8250272 |
height - age | -0.1027416 | 0.0520592 | -0.2060662 | 0.0006455 |
weight - age | -0.1722622 | 0.0513209 | -0.2722212 | -0.0697330 |
And in the graph below, to show their entire posterior distribution.
Simultaneous solution
The simultaneous solution is trickier but thankfully there’s a very helpful blog post by Solomon Kurz to explain it, so I’ll mostly just focus on running the code here and showing the result.
Initially this approach put me off because I did not understand the prior, but then I realized we could simply sample the prior as well and visualize it to show what the prior looks like.
Modelling multiple correlations at once requires specifying the formula
using multivariate syntax. You can take a look at the code below to see
what this syntax looks like. Additionally, we need to append
set_rescor(TRUE)
to the formula to tell brms to calculate residual
correlations, which will actually be the correlations we’re interested
in.
Family: MV(gaussian, gaussian, gaussian)
Links: mu = identity; sigma = log
mu = identity; sigma = log
mu = identity; sigma = log
Formula: height_z ~ 0
sigma ~ 0
weight_z ~ 0
sigma ~ 0
age_z ~ 0
sigma ~ 0
Data: data (Number of observations: 352)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
rescor(heightz,weightz) 0.75 0.02 0.71 0.78 1.00 3947
rescor(heightz,agez) -0.10 0.05 -0.20 0.00 1.00 3851
rescor(weightz,agez) -0.17 0.05 -0.26 -0.07 1.00 3997
Tail_ESS
rescor(heightz,weightz) 3079
rescor(heightz,agez) 3014
rescor(weightz,agez) 3153
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The model output shows the correlations in the Residual Correlations section. You’ll see that the estimates match the ones we found when running the correlations one-by-one. The 95% CIs also largely match, with some small differences (for more comparisons, see the previously linked blog post by Solomon Kurz).
Let’s also plot the posteriors, including their prior.
It looks like we used a relatively wide prior centered around 0. That’s
good to know because I had no idea what the lkj()
prior was doing.
Other than that the results look similar to what we found previously.
Summary
Running a correlation in brms is the same as running a simple regression, except that the outcome and predictor are standardized. Because of the standardization, the intercept can be omitted, thus simplifying the model. The priors are also easier to set as the correlation must range from -1 to and 1 and sigma from 0 to 1. You can also run multiple correlations by running separate models or by modelling them all at once using brms’ multivariate syntax.