# BDA3 Chapter 2 Exercise 10

Here’s my solution to exercise 10, chapter 2, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

## The posterior density

There an N cars labelled 1 to N and we observe a random car labelled 203. With a geometric prior with mean 100, the unnormalised posterior is

$p(N \mid y = 203) \propto p(y = 203 \mid N) \cdot p (N) = \left. \begin{cases} \frac{1}{N} \cdot \frac{1}{100} \cdot \left( \frac{99}{100} \right)^{N - 1} & \text{for } 203 \le N \\ 0 & \text{otherwise} \end{cases} \right\}$

We find the normalising constant in two steps. First set $$x := \frac{99}{100}$$ and use the Taylor series of $$\log(1 - x)$$ to show

\begin{align} \sum_1^\infty \frac{1}{N} \cdot \frac{1}{100}\cdot \left( \frac{99}{100} \right)^{N - 1} &= \frac{1}{100} \left( 1 + \frac{x}{2} + \frac{x^2}{3} + \dotsc + \frac{x^k}{k + 1} + \dotsc \right) \\ &= \frac{1}{100} \frac{1}{x} \left( x + \frac{x^2}{2} + \frac{x^3}{3} + \dotsc + \frac{x^k}{k} + \dotsc \right) \\ &= -\frac{1}{100} \frac{1}{x} \log (1 - x) \\ &= \frac{\log 100}{99} . \end{align}

The normalising constant $$c$$ is then

$c = \frac{\log 100}{99} - \sum_1^{202} \frac{1}{N} \cdot \frac{1}{100}\cdot \left( \frac{99}{100} \right)^{N - 1}$

which we approximate with the following computation.

# value of the Nth term in the sum
term <- function(N)
(1 / N) * (1 / 100) * (99 / 100)^(N - 1)

# left hand side
c0 <- log(100) / 99

# right hand side (the sum)
c1 <- 1:202 %>%
map(term) %>%
reduce(sum)

c <- c0 - c1

c
[1] 0.0004705084

## The posterior moments

The posterior mean is

\begin{align} \mathbb E(N \mid y = 203) &= \frac{1}{c}\sum_{203}^\infty \frac{N}{N} \cdot \frac{1}{100} \cdot \left( \frac{99}{100} \right)^{N - 1} \\ &= \frac{1}{100c} \left( x^{202} + x^{203} + \dotsc \right), \qquad x := \frac{99}{100} \\ &= \frac{1}{100c} \left( \frac{1}{1 - x} - (1 + x + x^2 + \dotsc + x^{201}) \right) \\ &= \frac{1}{100c} \left( \frac{1}{1 - x} - \frac{1 - x^{202}}{1 - x} \right) \\ &= \frac{1}{100c} \frac{x^{202}}{1 - x} \\ &= \frac{1}{c}\left( \frac{99}{100} \right)^{202} \end{align} ,

which is approximately

mu <- (1 / c) * (99 / 100)^202
mu
[1] 279.0885

This is larger than the prior mean of 100.

The second moment is

\begin{align} \mathbb E(N^2 \mid y = 203) &= \frac{1}{c}\sum_{203}^\infty \frac{N^2}{N} \cdot \frac{1}{100} \cdot \left( \frac{99}{100} \right)^{N - 1} \\ &= \frac{1}{100c} \left( 203x^{202} + 204x^{203} + \dotsc \right), \qquad x := \frac{99}{100} \\ &= \frac{1}{100c} \left( \frac{1}{(1 - x)^2} - \left(1 + 2x + 3x^2 + \dotsc + 202 x^{201} \right) \right) \\ &= \frac{1}{100c} \left( 100^2 - \left(1 + 2x + 3x^2 + \dotsc + 202 x^{201} \right) \right) \\ &= \frac{100}{c} - \frac{1 + 2x + 3x^2 + \dotsc + 202 x^{201} }{100c} \end{align} ,

which we approximate with the following code.

EN2_left <- 100 / c

EN2_right <- 1:201 %>%
map(function(N) N * (99 / 100)^(N - 1)) %>%
reduce(sum) %>%
/(100 * c)

EN2 <- EN2_left - EN2_right

EN2
[1] 84854.18

It follows that the posterior variance and standard deviation is approximately

v <- EN2 - mu^2
sigma <- sqrt(v)
c(v, sigma)
[1] 6963.78665   83.44931

These are smaller than the prior variance and standard deviation, respectively:

v_prior <- (99 / 100) / (0.01^2)
sigma_prior <- sqrt(v_prior)
c(v_prior, sigma_prior)
[1] 9900.00000   99.49874

## A non-informative prior

There is no proper uniform density over the positive integers. A uniform prior also leaves us with an improper posterior.

Jeffrey’s prior is $$p(N) \propto \frac{1}{N}$$, which is also improper. However, it yields the following (unnormalised) posterior

$p(N \mid y = 203) \propto \left. \begin{cases} \frac{1}{N^2} & \text{for } 203 \le N \\ 0 & \text{otherwise} \end{cases} \right\}$

which is proper.

Using the Basel problem we can calculate the normalising constant

$c = \sum_1^\infty \frac{1}{N^2} - \sum_1^{202} \frac{1}{N^2} = \frac{\pi^2}{6} - \sum_1^{202} \frac{1}{N^2}$

which is approximately

c_left <- pi^2 / 6

c_right <- 1:202 %>%
map(function(N) 1 / N^2) %>%
reduce(sum)

c <- c_left - c_right

c
[1] 0.004938262

The posterior mean is not well-defined since

$\mathbb E(N \mid y = 203) = \frac{1}{c}\sum_{203}^\infty \frac{1}{N} = \infty .$

The posterior variance and standard deviation are also not well-defined.