Understanding the hazard function

Posted on 5 August, 2018 by Brian
Tags: hazard, censoring, survival, exponential, poisson
Category: survival_models

Suppose you have fit a distribution to your censored survival times as in the previous post and now want to quantify the intuition of an event being imminent. For example, being able to characterise precisely when your customers are about to churn can help identify problem areas to improve on. This notion is called the hazard. We’ll take a look at some of its main properties and how it related to survival analysis.

Definition

Start with a small -interval around t and and consider the average probability density of an event ocurring in that interval given that it occurs after t: , where is the probability function. To get rid of the arbitrary choice of , we take the limit as goes to zero.

This is well-defined whenever the CDF is differentiable. Although it is defined in terms of probabilities, we will show below that the hazard is not itself a probability density function.

Identities

There are a number of useful properties of the hazard function that make it convenient to work with in survival analysis.

Equivalent definition

The above definition helps us understand the intuition behind the hazard function but there’s an equivalent formulation that can be easier to work with. Using

the definition of conditional probabilities,
the definition of a derivative, and
that where is the CDF and the probability function,

we can show that

We will show below how to use this to simplify the likelihood in the case of censored observations for measuring time to an event of interest.

Relation with the survival function

Using the identity above, we can rewrite the hazard as a derivative of the survival function:

It then follows from the first fundamental theorem of calculus that

In other words, the hazard function completely determines the survival function (and therefore also the mass/density function).

Since the integral of the hazard appears in the above equation, we can give it a definition for easier reference. We define the cumulative hazard as

Since , it follows that . In particular, this means that the hazard function is NOT a probability density function!

Example

The exponential distribution has constant hazard. To see this, suppose . Then so that , which is the probability function for the exponential distribution.

Hazard in censored survival analysis

In the previous post, we motivated the following likelihood in the case of censored survival times:

where is 1 if the event is observed and 0 if it is censored, and is the vector of parameters of the distribution of survival times. Since , we can rewrite this as

Example

Assuming that survival times follow an exponential distribution, the hazard is constant, and the likelihood is

where is the total number of events observed and is the total observation time. This expression has an interesting interpretation as a Poisson likelihood. To see this, first note that and can be considered constant in our likelihood because they don’t depend on our only parameter . We can consider as a Poisson variable with rate and exposure :

which gives probability of observing events as

However, likelihoods are equivalent up to a multiplicative constant. Since is constant we can treat our likelihood, , as Poisson.