# BDA3 Chapter 3 Exercise 8

Here’s my solution to exercise 8, chapter 3, of Gelman’s *Bayesian Data Analysis* (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

You can download the full dataset shown in table 3.3. Let’s load it into a dataframe and select just the residential data, as suggested.

```
df0 <- read_csv('data/chapter_03_exercise_08.csv') %>%
mutate(
type = as_factor(
type,
levels = c('residential', 'fairly_busy', 'busy'),
ordered = TRUE
),
bikes = as.integer(bikes),
other = as.integer(other)
)
df <- df0 %>%
filter(type == 'residential') %>%
mutate(
total = bikes + other,
bike_fraction = bikes / total,
other_fraction = other / total
)
```

Here are the first few rows with each value of `bike_route`

.

type | bike_route | bikes | other | total | bike_fraction | other_fraction |
---|---|---|---|---|---|---|

residential | TRUE | 16 | 58 | 74 | 0.2162162 | 0.7837838 |

residential | TRUE | 9 | 90 | 99 | 0.0909091 | 0.9090909 |

residential | TRUE | 10 | 48 | 58 | 0.1724138 | 0.8275862 |

residential | FALSE | 12 | 113 | 125 | 0.0960000 | 0.9040000 |

residential | FALSE | 1 | 18 | 19 | 0.0526316 | 0.9473684 |

residential | FALSE | 2 | 14 | 16 | 0.1250000 | 0.8750000 |

We’ll use an uninformative gamma prior with a Poisson likelihood for the counts. The posterior can then be calculated as follows.

```
draws <- 10000
shape_prior <- 2
rate_prior <- 0
posterior <- function(data, draws = 10000) {
bikes <- data %>% pull(bikes)
other <- data %>% pull(other)
n <- data %>% pull(n)
tibble(draw = 1:draws) %>%
mutate(
theta_bike = rgamma(draws, bikes, n),
theta_other = rgamma(draws, other, n),
mu = rpois(draws, theta_bike),
p = theta_bike / (theta_bike + theta_other)
)
}
posterior_draws <- df %>%
group_by(bike_route) %>%
summarise(
bikes = sum(bikes) + shape_prior,
other = sum(other) + shape_prior,
n = n() + rate_prior
) %>%
nest(-bike_route) %>%
mutate(draws = map(data, posterior, draws)) %>%
unnest(draws)
```

Plotting posterior predictive draws of \(\theta_y\) and \(\theta_z\), we can see that there seems to be quite a difference.

```
posterior_draws %>%
ggplot() +
aes(mu, fill = bike_route) +
geom_bar(position = 'identity', alpha = 0.75) +
labs(
x = 'Bike count',
y = 'Count',
fill = 'Has bike route?',
title = 'Posterior expectation of bike count'
)
```

To quantify this difference, we’ll have to match up our posterior draws for \(\theta_y\) and \(\theta_z\).

```
difference <- posterior_draws %>%
select(draw, bike_route, mu) %>%
spread(bike_route, mu) %>%
mutate(difference = `TRUE` - `FALSE`)
```

The difference \(\mu_y - \mu_z\) has the following 95% credible interval:

```
difference %>%
pull(difference) %>%
quantile(probs = c(0.05, 0.5, 0.95))
```

```
5% 50% 95%
6 15 24
```