Concetti Chiave
The proposed C-XGBoost model exploits the strong prediction abilities of XGBoost algorithm and the ability of causal inference neural networks to learn representations useful for estimating outcomes in both treatment and control groups, resulting in an effective tree-based ensemble model for causal effect estimation.
Sintesi
The paper proposes a new causal inference model called C-XGBoost that combines the strengths of tree-based models and neural network-based approaches for estimating causal effects from observational data.
Key highlights:
- C-XGBoost exploits the superior prediction capabilities of the XGBoost algorithm along with the ability of causal inference neural networks to learn representations useful for estimating outcomes in both treatment and control groups.
- The model can efficiently handle features with missing values and includes regularization techniques to avoid overfitting/bias.
- A new loss function is proposed to train the C-XGBoost model.
- Extensive experiments on synthetic and semi-synthetic datasets show that C-XGBoost outperforms state-of-the-art tree-based and neural network-based causal inference models in terms of estimating average treatment effect (ATE) and precision in estimation of heterogeneous effect (PEHE).
- Statistical analysis provides strong evidence of the effectiveness and superiority of the proposed C-XGBoost approach.
Statistiche
The paper uses two collections of semi-synthetic datasets for evaluating the causal inference models:
Synthetic dataset:
1000 covariates and 5000 samples
Generated using a process involving a hidden confounder variable, treatment assignment, and outcome
ACIC dataset:
Samples from distinct distributions generated with different treatment selection and outcome functions
5000 and 10000 samples per dataset
5 and 11 datasets randomly selected for the 5000 and 10000 sample sizes, respectively