CIS Primer Question 3.2.1
Here are my solutions to question 3.2.1 of Causal Inference in Statistics: a Primer (CISP).
Here are the parameters we’ll use. Note that they are taken from the Simpson’s revesal example of question 1.5.2.
r <- 0.28 # fraction with syndrome
q0 <- 0.07 # P(X = 1 | Z = 0)
q1 <- 0.85 # P(X = 1 | Z = 1)
p00 <- 0.84 # P(Y = 1 | X = 0, Z = 0)
p10 <- 0.88 # P(Y = 1 | X = 1, Z = 0)
p01 <- 0.53 # P(Y = 1 | X = 0, Z = 1)
p11 <- 0.58 # P(Y = 1 | X = 1, Z = 1)Part a
We can simulate the intervention by generating values for X independently of Z.
N <- 10000 # number of individuals
set.seed(53201)
part_a <- tibble(z = rbinom(N, 1, r)) %>%
mutate(
x = rbinom(n(), 1, 0.5), # no Z-dependence
p_y_given_x_z = case_when(
x == 0 & z == 0 ~ p00,
x == 0 & z == 1 ~ p01,
x == 1 & z == 0 ~ p10,
x == 1 & z == 1 ~ p11
),
y = rbinom(n(), 1, p_y_given_x_z)
) %>%
group_by(x, y) %>%
summarise(n = n()) %>%
group_by(x) %>%
mutate(p_y_given_do_x = n / sum(n))| x | y | n | p_y_given_do_x |
|---|---|---|---|
| 0 | 0 | 1238 | 0.2470565 |
| 0 | 1 | 3773 | 0.7529435 |
| 1 | 0 | 964 | 0.1932251 |
| 1 | 1 | 4025 | 0.8067749 |
Part b
To simulate observational data, we need to include the dependence of X on Z.
N <- 100000 # number of individuals
set.seed(95400)
p_x_y_z <- tibble(
id = 1:N,
z = rbinom(N, 1, r),
x = rbinom(N, 1, if_else(z == 0, q0, q1)),
p_y_given_x_z = case_when(
x == 0 & z == 0 ~ p00,
x == 0 & z == 1 ~ p01,
x == 1 & z == 0 ~ p10,
x == 1 & z == 1 ~ p11
),
y = rbinom(N, 1, p_y_given_x_z)
) %>%
group_by(x, y, z) %>%
count() %>%
ungroup() %>%
mutate(p = n / sum(n))| x | y | z | n | p |
|---|---|---|---|---|
| 0 | 0 | 0 | 10723 | 0.10723 |
| 0 | 0 | 1 | 2020 | 0.02020 |
| 0 | 1 | 0 | 56015 | 0.56015 |
| 0 | 1 | 1 | 2205 | 0.02205 |
| 1 | 0 | 0 | 624 | 0.00624 |
| 1 | 0 | 1 | 10067 | 0.10067 |
| 1 | 1 | 0 | 4470 | 0.04470 |
| 1 | 1 | 1 | 13876 | 0.13876 |
In order to apply the causal effect rule, we’ll need P(x∣z).
p_x_given_z <- p_x_y_z %>%
group_by(x, z) %>%
summarise(n = sum(n)) %>%
group_by(z) %>%
mutate(p = n / sum(n)) %>%
ungroup()| x | z | n | p |
|---|---|---|---|
| 0 | 0 | 66738 | 0.9290845 |
| 0 | 1 | 4225 | 0.1499929 |
| 1 | 0 | 5094 | 0.0709155 |
| 1 | 1 | 23943 | 0.8500071 |
We can then add the conditional probabilities to the joint distribution table, then sum overal all the Z variables.
p_y_given_do_x <- p_x_y_z %>%
inner_join(
p_x_given_z,
by = c('x', 'z'),
suffix = c('_num', '_denom')
) %>%
mutate(p = p_num / p_denom) %>%
group_by(x, y) %>%
summarise(p = sum(p)) | x | y | p |
|---|---|---|
| 0 | 0 | 0.2500877 |
| 0 | 1 | 0.7499123 |
| 1 | 0 | 0.2064264 |
| 1 | 1 | 0.7935736 |
The causal effect estimates are very close to the simulated intervention.
Part c
We can calculate ACE simply by taking the difference of the causal effect estimates.
ace <- p_y_given_do_x %>%
spread(x, p) %>%
filter(y == 1) %>%
mutate(ace = `1` - `0`) %>%
pull(ace)
ace[1] 0.04366134
This is different from the overall probability differences.
p_y_given_x <- p_x_y_z %>%
group_by(x, y) %>%
summarise(n = sum(n)) %>%
group_by(x) %>%
mutate(p = n / sum(n)) %>%
select(-n)
risk_difference <- p_y_given_x %>%
spread(x, p) %>%
filter(y == 1) %>%
mutate(rd = `1` - `0`) %>%
pull(rd)
risk_difference[1] -0.188613
Making X independent of Z would minimise the disrepancy between ACE and RD, which would turn the adjustment formula into the formulat for P(y∣x. In other words, setting q0=q1=P(X=1) would do the trick.
Part d
Note that the desegregated causal effects
- p1,0−p0,0 is 0.04; and
- p1,1−p0,1 is 0.05,
are both consisent with our calculation for the overall causal effect, ACE = 4.37%. The generated data are an illustration of Simpson’s reversal because the risk difference, -18.9%, has the opposite sign.