BDA3 Chapter 5 Exercise 15

Posted on 5 February, 2019 by Brian
Tags: bda chapter 5, solutions, normal, hierarchical model, posterior predictive distribution
Category: bda3

Here’s my solution to exercise 15, chapter 5, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

The data provided are in an awkward format. I’ve downloaded it with minor modifications to make it easier to parse.

df <- read_delim(
    'data/chapter_05_exercise_15_table_5.4.txt', 
    delim=' ',
    skip=4,
    trim_ws=TRUE
  ) %>% 
  transmute(
    study,
    y = log(treated.deaths / (treated.total - treated.deaths)) - log(control.deaths / (control.total - control.deaths)),
    sigma2 = (1 / treated.deaths) + (1 / (treated.total - treated.deaths)) + (1 / control.deaths) + (1 / (control.total - control.deaths)),
    sigma = sqrt(sigma2),
    n = treated.total + control.total
  ) %>% 
  select(-sigma2)

The meta-analysis data from table 5.4, page 124.
study	y	sigma	n
1	0.0281709	0.8503034	77
2	-0.7410032	0.4831516	230
3	-0.5406212	0.5645611	162
4	-0.2461281	0.1381833	3053
5	0.0694534	0.2806564	720
6	-0.5841569	0.6757127	111
7	-0.5123855	0.1386878	1884
8	-0.0786233	0.2039910	1103
9	-0.4241734	0.2739730	560
10	-0.3348234	0.1170683	3837
11	-0.2133975	0.1948720	1456
12	-0.0389084	0.2294606	529
13	-0.5932537	0.4251674	584
14	0.2815459	0.2054455	1741
15	-0.3213336	0.2977091	301
16	-0.1353479	0.2609219	420
17	0.1406065	0.3641742	373
18	0.3220497	0.5526449	305
19	0.4443805	0.7166491	308
20	-0.2175097	0.2598417	427
21	-0.5910760	0.2572069	755
22	-0.6080991	0.2723787	1354

We’ll use the model described in the book. Note that by not explicitly giving a prior for \(\mu\) or \(\tau\), stan gives them a uniform prior.

model <- rstan::stan_model('src/ex_05_15.stan')

S4 class stanmodel 'ex_05_15' coded as follows:
data {
  int<lower = 1> n;
  int<lower = 1> study[n];
  vector[n] y;
  vector<lower = 0>[n] sigma;
}

parameters {
  real mu;
  real<lower = 0> tau;
  vector[n] theta;
}

model {
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}

Let’s fit the model.

set.seed(57197)

fit <- model %>% 
  sampling(
    data = tidybayes::compose_data(df),
    warmup = 1000,
    iter = 5000
  )

We’ll draw the posterior population parameters separately from the study parameters purely for convenience.

pop_params <- fit %>% 
  tidybayes::spread_draws(mu, tau)

draws <- fit %>% 
  tidybayes::spread_draws(mu, tau, theta[study])

The population standard deviation \(\tau\) has most of its mass below 0.4.

A histogram of the posterior draws for τ.

As in figure 5.6, the effect estimates are almost identical for \(\tau \approx 0\) and spread out as \(\tau\) increases.

Conditional posterior means of treatment effects E(θ | τ, y), as functions of the between-study standard deviation τ.

Let’s get the median effect estimates with 95% posterior intervals. The low sample size estimates (dark dots) remain close to the population mean (\(\mu\)), whereas the larger sample size estimates (light dots) can move closer to the unpooled estimates (dotted line).

cis <- draws %>% 
  median_qi() %>% 
  inner_join(df, by = 'study')

Comparison of the crude effect estimates with the posterior median effects.

To estimate an effect for a new study, we draw \(\theta \sim \dnorm(\mu, \tau)\).

new_theta <- pop_params %>% 
  mutate(theta = rnorm(n(), mu, tau))

prob_new_theta_positive <- new_theta %>% 
  summarise(sum(theta > 0) / n()) %>% 
  pull()

It has a 7.66% probability of being positive.

Posterior simulations of a new treatment effect.