BDA3 Chapter 5 Exercise 8

Posted on 1 December, 2018 by Brian
Tags: bda chapter 5, solutions, bayes, mixture, conjugate prior
Category: bda3

Here’s my solution to exercise 8, chapter 5, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

Let \(p_m(\theta)\), \(m = 1, \dotsc, M\), be conjugate prior densities for the likelihood \(y \mid \theta\), and \(\lambda_m \in [0, 1]\) such that \(\sum_1^M \lambda_m = 1\). We show that \(p(\theta) := \sum_{m = 1}^M \lambda_m p_m(\theta)\) is also a conjugate prior. This follows from the calculation of the posterior:

\[ \begin{align} p (\theta \mid y) &\propto p (y \mid \theta) p(\theta) \\ &= p (y \mid \theta) \sum_1^M \lambda_m p_m(\theta) \\ &= \sum_1^M \lambda_m p(y \mid \theta) p_m(\theta) \\ &= \sum_1^M \lambda_m p_m(y) p_m (\theta \mid y) \\ &= \sum_1^M \lambda_m' p_m (\theta \mid y) , \end{align} \]

where \(\lambda_m' := \lambda_m p_m (y)\). Since each term \(p_m(\theta \mid y)\) has the same parametric form as \(p_m(\theta)\), the posterior has the same parametric form as the prior, \(\sum_1^M \lambda_m p_m(\theta)\).

To apply this to a concrete example, suppose \(y \mid \theta \sim \dnorm(\theta, 1)\), where \(\theta\) is mostl likely near \(1\) with standard deviation of 0.5 but has some probability of being near -1, still with standard deviation 0.5. Let’s calculate the posterior after 10 observations with mean -0.25.

mu_pos <- 1
sd_pos <- 0.5
lambda_pos <- 0.8

mu_neg <- -1
sd_neg <- 0.5
lambda_neg <- 1 - lambda_pos

prior <- tibble(
  theta = seq(-3, 3, 0.01),
  density_pos = dnorm(theta, mu_pos, sd_pos),
  density_neg = dnorm(theta, mu_neg, sd_neg),
  density = lambda_pos * density_pos + lambda_neg * density_neg
)

The posterior for each mixture component can be calculated using equation 2.12 (page 42). In particular,

\[ \begin{align} p^\pm (\theta \mid \bar y) &= \dnorm( \theta \mid \mu_{10}^\pm, \sigma_{10}^\pm) , \end{align} \]

where

n <- 10
ybar <- -0.25

sigma <- 1

sd_10_pos <- 1 / sqrt(1 / sd_pos^2 + n / sigma^2)
mu_10_pos <- (mu_pos / sd_pos^2 + n * ybar / sigma^2) / (1 / sd_10_pos^2)

sd_10_neg <- 1 / sqrt(1 / sd_neg^2 + n / sigma^2)
mu_10_neg <- (mu_neg / sd_neg^2 + n * ybar / sigma^2) / (1 / sd_10_neg^2)

This gives us positive and negative means of 0.107, -0.464, respectively, and stanard deviations both equal to 0.267.

In order to find the full posterior distribution, we also need the posterior mixture proportions. These are given by

py_pos <- dnorm(ybar, mu_pos, sqrt(sd_pos^2 + sigma^2 / n))
lambda_prime_pos0 <- lambda_pos * py_pos

py_neg <- dnorm(ybar, mu_neg, sqrt(sd_neg^2 + sigma^2 / n))
lambda_prime_neg0 <- lambda_neg * py_neg

normaliser <- lambda_prime_pos0 + lambda_prime_neg0
lambda_prime_pos <- lambda_prime_pos0 / normaliser
lambda_prime_neg <- lambda_prime_neg0 / normaliser

This gives positive and negative weights of 48.94%, 51.06%, respectively.

We’ll calculate the normalised posterior density on a grid between [-3, 3].

granularity <- 0.01

posterior <- tibble(
  theta = seq(-3, 3, granularity),
  density_pos = dnorm(theta, mu_10_pos, sd_10_pos),
  density_neg = dnorm(theta, mu_10_neg, sd_10_neg),
  density_unnormalised = lambda_prime_pos * density_pos + lambda_prime_neg * density_neg,
  density = density_unnormalised / sum(density_unnormalised * granularity)
)

This posterior looks like a ‘hunchback normal’ distribution, with the two modes being much less distinct.