BDA3 Chapter 2 Exercise 13

Posted on 2 September, 2018 by Brian
Tags: bda chapter 2, solutions, bayes, poisson, gamma, jeffrey prior, posterior predictive
Category: bda3

Here’s my solution to exercise 13, chapter 2, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

We are given data on airline deaths and asked to fit various models to that data.

The data

We are given the data shown below. The data didn’t seem to be available anywhere so I created the csv file myself.

year	fatal_accidents	passenger_deaths	death_rate
1976	24	734	0.19
1977	25	516	0.12
1978	31	754	0.15
1979	31	877	0.16
1980	22	814	0.14
1981	21	362	0.06
1982	26	764	0.13
1983	20	809	0.13
1984	16	223	0.03
1985	22	1066	0.15

Let’s get acquainted with the data by plotting it as a timeseries.

Part a

We model the number of fatal accidents as poisson \(y \mid \theta \sim \dpois(\theta)\), where we put a \(\theta \sim \dgamma(\alpha, \beta)\) prior on the parameter. I don’t really have any strong prior knowledge about the number of annual fatal flight accidents. I’ll use the gamma approximation to Jeffrey’s prior from the previous exercise, even though it places probability on very extreme values. We’ll stick with this prior throughout.

shape <- 0.5
rate <- .Machine$double.xmin

The posterior is \(\dgamma(0.5 + n\bar y, n) = \dgamma(0.5 + 238, 10)\).

To obtain a 95% posterior predictive interval, we draw \(\theta\) from its posterior, then draw \(y\) from the corresponding Poisson distribution. With these draws, we can obtain the necessary quantiles.

n_draws <- 50000

theta_posterior_a <- rgamma(n_draws, 
                            shape + sum_fatal_accidents, 
                            rate + n_observations
                           ) 
y_pp_a <- rpois(n_draws, theta_posterior_a) 

mu_a <- mean(y_pp_a)
ci_a <- quantile(y_pp_a, c(0.05, 0.95))

ci_a

 5% 95% 
 16  33

Part b

In part a, we ignored how many flights there are. We can incorporate this information into our model by using passenger_miles as a measure of exposure. The parameter \(\theta\) is now the rate of fatal accidents per year per 100 million passenger miles. Note that this rate is over an order of magnitude smaller than the death rate in the table because the number of fatal accidents is an order of magnitude smaller than the number of passenger deaths. The posterior is \(\theta \mid y \sim \dgamma(0.5 + 238, 57158.69)\).

The 95% posterior predictive interval seems to be shifted upwards compared to the interval in part a.

theta_posterior_b <- rgamma(n_draws, 
                            shape + sum_fatal_accidents, 
                            rate + sum_passenger_miles
                           )
y_pp_b <- rpois(n_draws, theta_posterior_b * 8000)

mu_b <- mean(y_pp_b)

ci_b <- quantile(y_pp_b, c(0.05, 0.95))

ci_b

 5% 95% 
 24  44

Part c

Here we use the same model as in part a but for the number of passenger deaths instead of fatal accidents.

Only 1 of the 10 observations in the dataset lie within the 95% posterior predictive interval.

theta_posterior_c <- rgamma(n_draws, 
                            shape + sum_passenger_deaths, 
                            rate + n_observations
                           )
y_pp_c <- rpois(n_draws, theta_posterior_c) 

mu_c <- mean(y_pp_c)
ci_c <- quantile(y_pp_c, c(0.05, 0.95))

ci_c

 5% 95% 
647 737

Part d

Now we use the same model as in part b but for passenger deaths instead of fatal accidents. The posterior is \(\dgamma(0.5 + 238, 57158.69)\).

None of the observed values falls into the 95% posterior predictive interval.

theta_posterior_d <- rgamma(n_draws, 
                            shape + sum_passenger_deaths, 
                            rate + sum_passenger_miles
                           )
y_pp_d <- rpois(n_draws, theta_posterior_d * 8000)

mu_d <- mean(y_pp_d)

ci_d <- quantile(y_pp_d, c(0.05, 0.95))

ci_d

  5%  95% 
 914 1023

Part e

There are a number of issues to consider that are not mentioned in the question or suggested by the data. The number of fatal accidents depends on the number of miles flown by airplanes: if there are more flights, there will likely be more accidents. However, the number of flights isn’t directly accounted for in the number of passenger miles since the number of passengers per flight can vary from year to year. In any case, the number of passenger deaths per year is not independent because passengers on the same flight will have more similar survival chances.