AI/ML News & Innovations Hub

I am estimating the effect of a continuous treatment on a continuous outcome using a two-way fixed effects model with individual and year fixed effects (fixest::feols) and cluster-robust standard errors. I want to estimate heterogeneous effects by sex, a binary, time-invariant grouping variable.

With OLS, I get exactly the behaviour I would expect.

m <- lm(outcome ~ treatment * sex, data = dat)

broom::tidy(m)

marginaleffects::avg_comparisons(
  m,
  variables = "treatment",
  by = "sex"
)

lm(outcome ~ treatment, data = subset(dat, sex == "Female"))
lm(outcome ~ treatment, data = subset(dat, sex == "Male"))

The female marginal effect returned by avg_comparisons() is exactly the same as the coefficient from the female-only regression (-0.00206), and likewise the male marginal effect is exactly the same as the male-only regression (-0.00009).

However, the same is not true with fixed effects.

I estimate

m <- feols(
  outcome ~ treatment * sex | id + year,
  data = dat,
  cluster = ~neighbourhood
)

The fitted coefficients are

treatment                 =  0.00279
treatment:sexFemale       = -0.00932

and therefore

marginaleffects::avg_comparisons(
  m,
  variables = "treatment",
  by = "sex"
)

returns

Female = -0.00652
Male   =  0.00279

However, estimating separate fixed-effects models gives

feols(
  outcome ~ treatment | id + year,
  data = subset(dat, sex == "Female"),
  cluster = ~neighbourhood
)

feols(
  outcome ~ treatment | id + year,
  data = subset(dat, sex == "Male"),
  cluster = ~neighbourhood
)

with estimates

Female = -0.00379
Male   = -0.00034

The subgroup samples are additive:

Female observations + Male observations = pooled observations.
Female individuals + Male individuals = pooled individuals.

I also verified that this is not an issue with avg_comparisons(); without fixed effects in fixest::feols(), the marginal effects and subgroup regressions coincide exactly.

My questions are:

Is it expected that the sex-specific marginal effects from a pooled two-way fixed effects model differ from the coefficients obtained by estimating the same fixed effects model separately by subgroup?
If so, what property of the within estimator causes this difference?
If the substantive question is whether the treatment effect differs by sex, is the pooled interaction model generally preferred over estimating separate subgroup fixed-effects regressions?

And yes, I'm aware the effects are tiny.

Edit: here's a reproducible example

library(tidyverse)
library(fixest)

set.seed(123)

n_id <- 60
years <- 2001:2008

dat <- expand_grid(id = 1:n_id, year = years) |>
    mutate(
        sex = factor(if_else(id <= n_id / 2, "Female", "Male")),
        id_fe = rnorm(n_id)[id],
        t = year - min(years),

        # common year effect
        year_fe = c(-1.2, -0.8, -0.4, 0, 0.4, 0.8, 1.2, 1.6)[t + 1],

        # treatment varies within id over time and differs slightly by sex
        treat = 0.5 * t + if_else(sex == "Female", 0.2 * t, -0.1 * t) + rnorm(n(), sd = 0.5),

        # outcome depends on treatment, id FE, common year FE, and a female-specific year shock
        y = 1 + id_fe + year_fe +
            0.4 * treat - 0.3 * (sex == "Female") * treat +
            rnorm(n(), sd = 0.5)
    )

# pooled interaction model
m_pool <- feols(y ~ treat * sex | id + year, data = dat)

broom::tidy(m_pool)

marginaleffects::avg_comparisons(
    m_pool,
    variables = "treat",
    by = "sex"
)

# subgroup FE regressions
m_female <- feols(y ~ treat | id + year, data = subset(dat, sex == "Female"))
m_male   <- feols(y ~ treat | id + year, data = subset(dat, sex == "Male"))

broom::tidy(m_female)
broom::tidy(m_male)

The variable 'sexMale' has been removed because of collinearity (see $collin.var).

# A tibble: 2 × 5
  term          estimate std.error statistic  p.value
  <chr>            <dbl>     <dbl>     <dbl>    <dbl>
1 treat           0.0453    0.0426      1.06 2.89e-01
2 treat:sexMale   0.348     0.0448      7.76 6.76e-14

    sex Estimate Std. Error    z Pr(>|z|)    S   2.5 % 97.5 %
 Female   0.0453     0.0426 1.06    0.288  1.8 -0.0382  0.129
 Male     0.3929     0.0664 5.92   <0.001 28.2  0.2628  0.523

Term: treat
Type: response
Comparison: +1

# A tibble: 1 × 5
  term  estimate std.error statistic p.value
  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 treat   0.0605    0.0747     0.810   0.419

# A tibble: 1 × 5
  term  estimate std.error statistic     p.value
  <chr>    <dbl>     <dbl>     <dbl>       <dbl>
1 treat    0.387    0.0765      5.06 0.000000939
```

Why do subgroup TWFE regressions and an interaction TWFE model give different coefficients, when they agree in plain OLS?