Bayesian tutorial: Correlation

statistics
tutorial
Bayesian statistics
regression
The third of a series of tutorial posts on Bayesian analyses. In this post I focus on using brms to model a correlation.
Published

February 12, 2023

In my previous blog post, I showed how to use brms and tidybayes to run a simple regression, i.e., a regression with a single predictor. This analysis required us to set three priors: an intercept prior, a sigma prior, and a slope prior. We can simplify this analysis by turning it into a correlational analysis. This will remove the intercept prior and lets us think about the prior for the slope in as a standardized effect size, i.e., the correlation.

To run a correlational analysis we’ll need to standardize the outcome and predictor variable, so in the code below I run the setup code as usual and also standardize both variables.

Code
library(tidyverse)
library(brms)
library(tidybayes)

data <- read_csv("Howell1.csv") |>
  filter(age >= 18) |>
  mutate(
    height_z  = (height - mean(height)) / sd(height),
    weight_z  = (weight - mean(weight)) / sd(weight),
  )

theme_set(theme_minimal())

colors <- c("#93CFDB", "#1E466E")

The formula for our model is slightly different compared to the formula of the previous single-predictor model and that’s because we can omit the intercept. By standardizing both the outcome and predictor variables, the intercept is guarenteed to be 0. The regression line always passes through the mean of the predictor and outcome variable. The mean of both is 0 because of the standardization and the intercept is the value the outcome takes when the predictor is 0. We could still include a prior for the intercept and set it to 0 (using constant(0)) but we can also simply tell brms not to estimate it. The formula syntax then becomes: height_z ~ 0 + weight_z.

Let’s confirm that this means we only need to set two priors.

Code
get_prior(height_z ~ 0 + weight_z, data = data)
prior class coef group resp dpar nlpar lb ub source
b default
b weight_z default
student_t(3, 0, 2.5) sigma 0 default

Indeed, we’re left with a prior for \(\sigma\) and one for weight_z, which we can specify either via class b or the specific coefficient for weight_z.

Let’s also write down our model more explicitly, which is the same as the single predictor regression but without the intercept (\(\alpha\)). \[\displaylines{heights_i ∼ Normal(\mu_i, \sigma) \\ \mu_i = \beta x_i}\]

Setting the priors

Let’s start with the prior for the slope (\(\beta\)). A correlation takes a value that ranges from -1 to 1. If you know absolutely nothing about what kind of correlation to expect, you could set a uniform prior that assign equals probability to every value from -1 to -1. Alternatively, we could use a prior that describes a belief that no correlation is most likely, but with some probability that higher correlations are possible too. This could be done with a normal distribution centered around 0. In the case of this particular model, in which height is regressed onto weight, we can probably expect a sizeable positive correlation. So let’s use a skewed normal distribution that puts most of the probability on a positive correlation but is wide enough to allow for a range of correlations, including a negative one. brms has the skew_normal() function to specify a prior that’s a skewed normal distribution. I fiddled around with the numbers a bit and the distribution below is sort of what makes sense to me.

Code
prior <- tibble(r = seq(-1, 1, .01)) %>%
  mutate(
    prob = dskew_normal(r, xi = .7, omega = .4, alpha = -3)
  )

ggplot(prior, aes(x = r, y = prob)) + 
  geom_line() +
  labs(x = "Slope", y = "")

Prior distribution for the correlation

What should the prior for \(\sigma\) be? With the variables standardized, \(\sigma\) is limited to range from 0 to 1. If the predictor explains all the variance of the outcome variable, the residuals will be 0, meaning \(\sigma\) will be 0. If the predictor explains no variance, \(\sigma\) is equal to 1 because it will be similar to the standard deviation of the outcome variable, which is 1 because we’ve standardized it. Interestingly, this also means that the prior for \(\sigma\) is now dependent on the prior for the slope, because the slope is what determines how much variance is explained in the outcome variable. I don’t know exactly how to deal with this dependency, except to fear it and make sure to carefully inspect the output so that we don’t have any problems due to incompatible priors. One way to avoid it entirely is to use a uniform prior that assign equal plausibility to each value between 0 and 1, so let’s do that.

Running the model

With the priors ready, we can run the model.

Code
model <- brm(
  height_z ~ 0 + weight_z,  
  data = data, 
  family = gaussian,
  prior = c(
      prior(uniform(0, 1), class = "sigma", ub = 1),
      prior(
        skew_normal(.7, .4, -3), 
        class = "b", lb = -1, ub = 1
      )
    ), 
  sample_prior = TRUE,
  cores = 4,
  seed = 4,
  file = "models/model.rds"
)

model
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: height_z ~ 0 + weight_z 
   Data: data (Number of observations: 352) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
weight_z     0.75      0.03     0.68     0.81 1.00     2816     1902

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.66      0.02     0.61     0.71 1.00     2939     2107

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The output shows that the estimate for the slope, i.e., the correlation, is 0.75. This is just one number though. Let’s visualize the entire distribution, including the prior.

Code
draws <- model %>%
  gather_draws(prior_b, b_weight_z) %>%
  ungroup() %>%
  mutate(
    distribution = if_else(
      str_detect(.variable, "prior"), "prior", "posterior"
    ),
    distribution = fct_relevel(distribution, "prior")
  )

ggplot(draws, aes(x = .value, fill = distribution)) +
  geom_histogram(position = "identity", alpha = .85) +
  labs(x = "Correlation", y = "", fill = "Distribution") +
  scale_fill_manual(values = colors)

It looks like we can update towards a higher correlation and also be more certain about it because the range of the posterior is much narrower than that of our prior.

What about sigma? Sigma is the standard deviation of the residuals, i.e., that what is unexplained by the predictor. We saw that the correlation between the predictor and outcome is 0.75. Squaring this number gives us the amount of variance explained (0.56), so if we subtract this from 1 we’re left with the variance that is unexplained (0.44). Squaring this number to bring it back to a standard deviations gives us 0.67, which matches the estimate for sigma that we saw in the output of brms.

Using a regularizing prior

In the previous section we used a personal and hopefully informed prior, at least to some degree. What would happen if we instead used a generic weakly informative prior? This is a prior centered at 0 with a standard deviation of 1.

Code
prior <- tibble(r = seq(-1, 1, .01)) %>%
  mutate(prob = dnorm(r, mean = 0, sd = 1))

ggplot(prior, aes(x = r, y = prob)) + 
  geom_line() +
  labs(x = "Slope", y = "")

Generic weakly informative prior for the correlation

It’s a very broad prior, ranging from -1 to 1. It is, however, centered at 0 so wouldn’t that push the final estimate closer to a null effect? Let’s see by running the model.

Code
model_generic_prior <- brm(
  height_z ~ 0 + weight_z,  
  data = data, 
  family = gaussian,
  prior = c(
      prior(uniform(0, 1), class = "sigma", ub = 1),
      prior(
        normal(0, 1), 
        class = "b", lb = -1, ub = 1
      )
    ), 
  sample_prior = TRUE,
  cores = 4,
  seed = 4,
  file = "models/model_generic_prior_z.rds"
)

model_generic_prior
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: height_z ~ 0 + weight_z 
   Data: data (Number of observations: 352) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
weight_z     0.76      0.04     0.69     0.82 1.00     2952     2120

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.66      0.03     0.61     0.71 1.00     2780     2000

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The previous estimate of the correlation was 0.75 and now it’s 0.76. Apparently the prior did not influence the final estimate. This hopefully alleviates some worries about priors always having a strong impact on the final results and it also shows you don’t always need to carefully construct a prior. Of course, in certain cases the prior will have a strong influence, for example when the prior is very strong or when there isn’t much data.

Multiple correlations

What if you want to test multiple correlations? There are two ways to do this, as far as I know. The first is to simply run separate models, each testing a single correlation. The second is by creating a model that tests multiple correlations at once. The latter is possible but it’s more complicated and I don’t fully understand it yet. If you want to see how it is done, please see this blog post by Solomon Kurz.

I’ll instead focus on simply running multiple models, but to make it easier I’ll show how to use the update() function in brms so we at least don’t have to write as much code. In the code below I standardize age and run three models in total, correlating height with weight, height with age, and age with weight. I’ll use the same generic prior from the last model and repeat the full code for that model, followed by two updates.

Code
data <- mutate(data, age_z = (age - mean(age)) / sd(age))

r_height_weight <- brm(
  height_z ~ 0 + weight_z,  
  data = data, 
  family = gaussian,
  prior = c(
      prior(uniform(0, 1), class = "sigma", ub = 1),
      prior(normal(0, 1), class = "b", lb = -1, ub = 1)
    ), 
  cores = 4,
  seed = 4,
  file = "models/r_height_weight.rds",
  control = list(adapt_delta = 0.9)
)

r_height_age <- update(
  r_height_weight, 
  height_z ~ 0 + age_z, 
  newdata = data,
  file = "models/r_height_age.rds"
)
r_weight_age <- update(
  r_height_weight, 
  weight_z ~ 0 + age_z, 
  newdata = data,
  file = "models/r_weight_age.rds"
)

When I first ran the code above I received a warning that there were divergent transitions after warmup for the extra two correlations. That means the posterior may not have been sampled correctly, which is a problem. The solution is to increase the adapt_delta value. What that does is simply make the underlying sampler more conservative, which will increase the accuracy of approximating the posterior but might slow down the sampler.

The correlations are in the table below.

Code
bind_rows(
  as_tibble(fixef(r_height_weight)),
  as_tibble(fixef(r_height_age)),
  as_tibble(fixef(r_weight_age))
) %>%
  mutate(
    pair = c("height - weight", "height - age", "weight - age"),
    .before = Estimate
  ) %>%
  select(-Est.Error) %>%
  rename(r = Estimate)
The three correlations and their 95% HDI.
pair r Q2.5 Q97.5
height - weight 0.7536855 0.6853923 0.8219682
height - age -0.1026678 -0.2071868 -0.0045065
weight - age -0.1729065 -0.2760893 -0.0726302

And in the graph below, to show their entire posterior distribution.

Code
correlation_draws <- bind_rows(
  r_height_weight %>%
    spread_draws(b_weight_z) %>%
    rename(r = b_weight_z) %>%
    mutate(pair = "height - weight"),
  r_height_age %>%
    spread_draws(b_age_z) %>%
    rename(r = b_age_z) %>%
    mutate(pair = "height - age"),
  r_weight_age %>%
    spread_draws(b_age_z) %>%
    rename(r = b_age_z) %>%
    mutate(pair = "weight - age")
)

ggplot(correlation_draws, aes(x = r, y = pair)) +
  stat_halfeye() +
  labs(x = "Correlation", y = "")

The posterior distributions of the three correlations

Summary

Running a correlation in brms is the same as running a simple regression, except that the outcome and predictor are standardized. Because of the standardization, the intercept can be omitted, thus simplifying the model. The priors are also easier to set as the prior for the correlation ranges from -1 to and 1 and the prior for sigma from 0 to 1. You can use regularizing priors to let the data do most of the talking.