BDA3 Chapter 3 Exercise 8

Posted on 21 October, 2018 by Brian
Tags: bda chapter 3, solutions, bayes, poisson
Category: bda3

Here’s my solution to exercise 8, chapter 3, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

You can download the full dataset shown in table 3.3. Let’s load it into a dataframe and select just the residential data, as suggested.

df0 <- read_csv('data/chapter_03_exercise_08.csv') %>% 
  mutate(
    type = as_factor(
      type, 
      levels = c('residential', 'fairly_busy', 'busy'), 
      ordered = TRUE
    ),
    bikes = as.integer(bikes),
    other = as.integer(other)
  )

df <- df0 %>% 
  filter(type == 'residential') %>% 
  mutate(
    total = bikes + other,
    bike_fraction = bikes / total,
    other_fraction = other / total
  )

Here are the first few rows with each value of bike_route.

type	bike_route	bikes	other	total	bike_fraction	other_fraction
residential	TRUE	16	58	74	0.2162162	0.7837838
residential	TRUE	9	90	99	0.0909091	0.9090909
residential	TRUE	10	48	58	0.1724138	0.8275862
residential	FALSE	12	113	125	0.0960000	0.9040000
residential	FALSE	1	18	19	0.0526316	0.9473684
residential	FALSE	2	14	16	0.1250000	0.8750000

We’ll use an uninformative gamma prior with a Poisson likelihood for the counts. The posterior can then be calculated as follows.

draws <- 10000

shape_prior <- 2
rate_prior <- 0

posterior <- function(data, draws = 10000) {
  
  bikes <- data %>% pull(bikes)
  other <- data %>% pull(other)
  n <- data %>% pull(n)
  
  tibble(draw = 1:draws) %>%
    mutate(
      theta_bike = rgamma(draws, bikes, n),
      theta_other = rgamma(draws, other, n),
      mu = rpois(draws, theta_bike),
      p = theta_bike / (theta_bike + theta_other)
    )
  
}

posterior_draws <- df %>% 
  group_by(bike_route) %>% 
  summarise(
    bikes = sum(bikes) + shape_prior,
    other = sum(other) + shape_prior,
    n = n() + rate_prior
  ) %>% 
  nest(-bike_route) %>% 
  mutate(draws = map(data, posterior, draws)) %>% 
  unnest(draws)

Plotting posterior predictive draws of \(\theta_y\) and \(\theta_z\), we can see that there seems to be quite a difference.

posterior_draws %>% 
  ggplot() +
  aes(mu, fill = bike_route) +
  geom_bar(position = 'identity', alpha = 0.75) +
  labs(
    x = 'Bike count',
    y = 'Count',
    fill = 'Has bike route?',
    title = 'Posterior expectation of bike count'
  )

To quantify this difference, we’ll have to match up our posterior draws for \(\theta_y\) and \(\theta_z\).

difference <- posterior_draws %>% 
  select(draw, bike_route, mu) %>% 
  spread(bike_route, mu) %>% 
  mutate(difference = `TRUE` - `FALSE`)

The difference \(\mu_y - \mu_z\) has the following 95% credible interval:

difference %>% 
  pull(difference) %>% 
  quantile(probs = c(0.05, 0.5, 0.95))

 5% 50% 95% 
  6  15  24