BDA3 Chapter 3 Exercise 11

Here’s my solution to exercise 11, chapter 3, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

\(\DeclareMathOperator{\dbinomial}{Binomial} \DeclareMathOperator{\dbern}{Bernoulli} \DeclareMathOperator{\dpois}{Poisson} \DeclareMathOperator{\dnorm}{Normal} \DeclareMathOperator{\dt}{t} \DeclareMathOperator{\dcauchy}{Cauchy} \DeclareMathOperator{\dexponential}{Exp} \DeclareMathOperator{\duniform}{Uniform} \DeclareMathOperator{\dgamma}{Gamma} \DeclareMathOperator{\dinvgamma}{InvGamma} \DeclareMathOperator{\invlogit}{InvLogit} \DeclareMathOperator{\logit}{Logit} \DeclareMathOperator{\ddirichlet}{Dirichlet} \DeclareMathOperator{\dbeta}{Beta}\)

We will analyse the data given in section 3.7 using different priors.

df <- read_csv('data/chapter_03_exercise_11.csv') 
dose_log_g_ml animals deaths
-0.86 5 0
-0.30 5 1
-0.05 5 3
0.73 5 5

Here is the model specification.

\[ \begin{align} y_i \mid \theta_i &\sim \dbinomial(n_i, \theta_i) \\ \logit(\theta_i) &= \alpha + \beta x_i \\ \alpha &\sim \dnorm(0, 2^2) \\ \beta &\sim \dnorm(10, 10^2) \end{align} \]

We won’t use a grid approximation to the posterior but instead just use Stan because it is a lot simpler.

m <- rstanarm::stan_glm(
  cbind(deaths, animals - deaths)  ~ 1 + dose_log_g_ml,
  family = binomial(link = logit),
  data = df,
  prior_intercept = normal(0, 2),
  prior = normal(10, 10),
  warmup = 500,
  iter = 4000
)

summary(m)
Model Info:

 function:     stan_glm
 family:       binomial [logit]
 formula:      cbind(deaths, animals - deaths) ~ 1 + dose_log_g_ml
 algorithm:    sampling
 priors:       see help('prior_summary')
 sample:       14000 (posterior sample size)
 observations: 4
 predictors:   2

Estimates:
                mean   sd   2.5%   25%   50%   75%   97.5%
(Intercept)    1.3    1.0 -0.5    0.6   1.2   1.9   3.4   
dose_log_g_ml 11.0    5.0  3.5    7.3  10.3  13.9  22.5   
mean_PPD       2.3    0.4  1.5    2.0   2.2   2.5   3.2   
log-posterior -5.6    1.1 -8.4   -6.0  -5.2  -4.8  -4.5   

Diagnostics:
              mcse Rhat n_eff
(Intercept)   0.0  1.0  6103 
dose_log_g_ml 0.1  1.0  4788 
mean_PPD      0.0  1.0  9575 
log-posterior 0.0  1.0  3826 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

The tidybayes package offers convenient functions for drawing from the posterior. We’ll also add in our LD50 estimate.

draws <- m %>% 
  tidybayes::spread_draws(`(Intercept)`, dose_log_g_ml) %>% 
  rename(
    alpha = `(Intercept)`,
    beta = dose_log_g_ml
  ) %>% 
  mutate(LD50 = -alpha / beta)

The estimates look much the same with the more informative priors as with the uninformative priors. The posterior probability that \(\beta > 0\) is:

draws %>% 
  mutate(positive = beta > 0) %>% 
  summarise(mean(positive)) %>% 
  pull() %>% 
  percent()
[1] "100%"

The posterior LD50 estimate (conditional on \(\beta > 0\)) is as follows:

draws %>% 
  filter(beta > 0) %>% 
  ggplot() + 
  aes(LD50) +
  geom_histogram(bins = 50) +
  geom_vline(xintercept = 0, linetype = 'dashed', colour = 'chocolate') +
  labs(
    y = 'Count',
    title = 'Histogram of posterior LD50 estimate',
    subtitle = 'Conditional on β > 0'
  )