CIS Primer Question 3.2.1

Posted on 10 February, 2019 by Brian
Tags: CISP chapter 3, solutions, ace, risk difference, simpson's reversal
Category: causal_inference_in_statistics_primer

Here are my solutions to question 3.2.1 of Causal Inference in Statistics: a Primer (CISP).

Here are the parameters we’ll use. Note that they are taken from the Simpson’s revesal example of question 1.5.2.

r   <- 0.28   # fraction with syndrome

q0 <- 0.07  # P(X = 1 | Z = 0)
q1 <- 0.85  # P(X = 1 | Z = 1)
             
p00 <- 0.84 # P(Y = 1 | X = 0, Z = 0)
p10 <- 0.88 # P(Y = 1 | X = 1, Z = 0)
p01 <- 0.53 # P(Y = 1 | X = 0, Z = 1)
p11 <- 0.58 # P(Y = 1 | X = 1, Z = 1)

Part a

We can simulate the intervention by generating values for independently of .

N <- 10000  # number of individuals

set.seed(53201)

part_a <- tibble(z = rbinom(N, 1, r)) %>% 
  mutate(
    x = rbinom(n(), 1, 0.5), # no Z-dependence
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    y = rbinom(n(), 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y) %>% 
  summarise(n = n()) %>% 
  group_by(x) %>% 
  mutate(p_y_given_do_x = n / sum(n))

x	y	n	p_y_given_do_x
0	0	1238	0.2470565
0	1	3773	0.7529435
1	0	964	0.1932251
1	1	4025	0.8067749

Part b

To simulate observational data, we need to include the dependence of on .

N <- 100000  # number of individuals

set.seed(95400)

p_x_y_z <- tibble(
    id = 1:N,
    
    z = rbinom(N, 1, r),
    x = rbinom(N, 1, if_else(z == 0, q0, q1)),
    
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    
    y = rbinom(N, 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y, z) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p = n / sum(n))

x	y	z	n	p
0	0	0	10723	0.10723
0	0	1	2020	0.02020
0	1	0	56015	0.56015
0	1	1	2205	0.02205
1	0	0	624	0.00624
1	0	1	10067	0.10067
1	1	0	4470	0.04470
1	1	1	13876	0.13876

In order to apply the causal effect rule, we’ll need .

p_x_given_z <- p_x_y_z %>% 
  group_by(x, z) %>% 
  summarise(n = sum(n)) %>% 
  group_by(z) %>% 
  mutate(p = n / sum(n)) %>% 
  ungroup()

x	z	n	p
0	0	66738	0.9290845
0	1	4225	0.1499929
1	0	5094	0.0709155
1	1	23943	0.8500071

We can then add the conditional probabilities to the joint distribution table, then sum overal all the variables.

p_y_given_do_x <- p_x_y_z %>% 
  inner_join(
    p_x_given_z, 
    by = c('x', 'z'), 
    suffix = c('_num', '_denom')
  ) %>% 
  mutate(p = p_num / p_denom) %>% 
  group_by(x, y) %>%
  summarise(p = sum(p))

x	y	p
0	0	0.2500877
0	1	0.7499123
1	0	0.2064264
1	1	0.7935736

The causal effect estimates are very close to the simulated intervention.

Part c

We can calculate ACE simply by taking the difference of the causal effect estimates.

ace <- p_y_given_do_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(ace = `1` - `0`) %>% 
  pull(ace)

ace

[1] 0.04366134

This is different from the overall probability differences.

p_y_given_x <- p_x_y_z %>% 
  group_by(x, y) %>% 
  summarise(n = sum(n)) %>% 
  group_by(x) %>% 
  mutate(p = n / sum(n)) %>% 
  select(-n) 

risk_difference <- p_y_given_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(rd = `1` - `0`) %>% 
  pull(rd)

risk_difference

[1] -0.188613

Making independent of would minimise the disrepancy between ACE and RD, which would turn the adjustment formula into the formulat for . In other words, setting would do the trick.

Part d

Note that the desegregated causal effects

is 0.04; and
is 0.05,

are both consisent with our calculation for the overall causal effect, ACE = 4.37%. The generated data are an illustration of Simpson’s reversal because the risk difference, -18.9%, has the opposite sign.