Welcome

Have a gander at a recent post...

Speeding up Bayesian sampling with map_rect
9 August, 2019

Fitting a full Bayesian model can be slow, especially with a large dataset. For example, it’d be great to analyse the climate crisis questions in the European Social Survey (ESS), which typically has around 45,000 respondents from around Europe on a range of socio-political questions. There are two main ways of parallelising your Bayesian model in Stan: between-chain parallelisation and within-chain parallelisation. The first of these is very easy to implement (chains = 4, cores = 4) - it simply runs the algorithm once on each core and pools the posterior samples at the end. The second method is more complicated as it requires a non-trivial modification to the Stan model, but can bring with it large speedups if you have the cores available. In this post we’ll get a >5x speedup of ordinal regression using within-chain parallelisation.

Hierarchical Customer Lifetime Value
5 May, 2019

In a previous post, we described how a model of customer lifetime value (CLV) works, implemented it in Stan, and fit the model to simulated data. In this post, we’ll extend the model to use hierarchical priors in two different ways: centred and non-centred parameterisations. I’m not aware of any other HMC-based implementations of this hierarchical CLV model, so we’ll run some basic tests to check it’s doing the right thing. More specifically, we’ll fit it to a dataset drawn from the prior predictive distribution. The resulting fits pass the main diagnostic tests and the 90% posterior intervals capture about 91% of the true parameter values.

BDA3 Chapter 1 Exercise 9
13 April, 2019

Here’s my solution to exercise 9, chapter 1, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

Pareto-NBD Customer Lifetime Value
6 April, 2019

Suppose you have a bunch of customers who make repeat purchases - some more frequenty, some less. There are a few things you might like to know about these customers, such as

  • which customers are still active (i.e. not yet churned) and likely to continue purchasing from you?; and
  • how many purchases can you expect from each customer?

Modelling this directly is more difficult than it might seem at first. A customer that regularly makes purchases every day might be considered at risk of churning if they haven’t purchased anything in the past week, whereas a customer that regularly puchases once per month would not be considered at risk of churning. That is, churn and frequency of purchasing are closely related. The difficulty is that we don’t observe the moment of churn of any customer and have to model it probabilistically.

There are a number of established models for estimating this, the most well-known perhaps being the SMC model (a.k.a pareto-nbd model). There are already some implementations using maximum likelihood or Gibbs sampling. In this post, we’ll explain how the model works, make some prior predictive simulations, and fit a version implemented in Stan.

BDA3 Chapter 1 Exercise 3
31 March, 2019

Here’s my solution to exercise 3, chapter 1, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

…or you can find more in the archives.