BDA3 Chapter 5 Exercise 15

Here’s my solution to exercise 15, chapter 5, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

\(\DeclareMathOperator{\dbinomial}{Binomial} \DeclareMathOperator{\dbern}{Bernoulli} \DeclareMathOperator{\dpois}{Poisson} \DeclareMathOperator{\dnorm}{Normal} \DeclareMathOperator{\dt}{t} \DeclareMathOperator{\dcauchy}{Cauchy} \DeclareMathOperator{\dexponential}{Exp} \DeclareMathOperator{\duniform}{Uniform} \DeclareMathOperator{\dgamma}{Gamma} \DeclareMathOperator{\dinvgamma}{InvGamma} \DeclareMathOperator{\invlogit}{InvLogit} \DeclareMathOperator{\logit}{Logit} \DeclareMathOperator{\ddirichlet}{Dirichlet} \DeclareMathOperator{\dbeta}{Beta}\)

The data provided are in an awkward format. I’ve downloaded it with minor modifications to make it easier to parse.

df <- read_delim(
    'data/chapter_05_exercise_15_table_5.4.txt', 
    delim=' ',
    skip=4,
    trim_ws=TRUE
  ) %>% 
  transmute(
    study,
    y = log(treated.deaths / (treated.total - treated.deaths)) - log(control.deaths / (control.total - control.deaths)),
    sigma2 = (1 / treated.deaths) + (1 / (treated.total - treated.deaths)) + (1 / control.deaths) + (1 / (control.total - control.deaths)),
    sigma = sqrt(sigma2),
    n = treated.total + control.total
  ) %>% 
  select(-sigma2)
The meta-analysis data from table 5.4, page 124.
study y sigma n
1 0.0281709 0.8503034 77
2 -0.7410032 0.4831516 230
3 -0.5406212 0.5645611 162
4 -0.2461281 0.1381833 3053
5 0.0694534 0.2806564 720
6 -0.5841569 0.6757127 111
7 -0.5123855 0.1386878 1884
8 -0.0786233 0.2039910 1103
9 -0.4241734 0.2739730 560
10 -0.3348234 0.1170683 3837
11 -0.2133975 0.1948720 1456
12 -0.0389084 0.2294606 529
13 -0.5932537 0.4251674 584
14 0.2815459 0.2054455 1741
15 -0.3213336 0.2977091 301
16 -0.1353479 0.2609219 420
17 0.1406065 0.3641742 373
18 0.3220497 0.5526449 305
19 0.4443805 0.7166491 308
20 -0.2175097 0.2598417 427
21 -0.5910760 0.2572069 755
22 -0.6080991 0.2723787 1354

We’ll use the model described in the book. Note that by not explicitly giving a prior for \(\mu\) or \(\tau\), stan gives them a uniform prior.

model <- rstan::stan_model('src/ex_05_15.stan')
S4 class stanmodel 'ex_05_15' coded as follows:
data {
  int<lower = 1> n;
  int<lower = 1> study[n];
  vector[n] y;
  vector<lower = 0>[n] sigma;
}

parameters {
  real mu;
  real<lower = 0> tau;
  vector[n] theta;
}

model {
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
} 

Let’s fit the model.

set.seed(57197)

fit <- model %>% 
  sampling(
    data = tidybayes::compose_data(df),
    warmup = 1000,
    iter = 5000
  )

We’ll draw the posterior population parameters separately from the study parameters purely for convenience.

pop_params <- fit %>% 
  tidybayes::spread_draws(mu, tau)

draws <- fit %>% 
  tidybayes::spread_draws(mu, tau, theta[study])

The population standard deviation \(\tau\) has most of its mass below 0.4.

A histogram of the posterior draws for τ.
A histogram of the posterior draws for τ.

As in figure 5.6, the effect estimates are almost identical for \(\tau \approx 0\) and spread out as \(\tau\) increases.

Conditional posterior means of treatment effects E(θ | τ, y), as functions of the between-study standard deviation τ.
Conditional posterior means of treatment effects E(θ | τ, y), as functions of the between-study standard deviation τ.

Let’s get the median effect estimates with 95% posterior intervals. The low sample size estimates (dark dots) remain close to the population mean (\(\mu\)), whereas the larger sample size estimates (light dots) can move closer to the unpooled estimates (dotted line).

cis <- draws %>% 
  median_qi() %>% 
  inner_join(df, by = 'study')
Comparison of the crude effect estimates with the posterior median effects.
Comparison of the crude effect estimates with the posterior median effects.

To estimate an effect for a new study, we draw \(\theta \sim \dnorm(\mu, \tau)\).

new_theta <- pop_params %>% 
  mutate(theta = rnorm(n(), mu, tau)) 
prob_new_theta_positive <- new_theta %>% 
  summarise(sum(theta > 0) / n()) %>% 
  pull() 

It has a 7.66% probability of being positive.

Posterior simulations of a new treatment effect.
Posterior simulations of a new treatment effect.