[scikit-learn] Highly cited paper - causal random forests

Sat May 25 06:21:01 EDT 2019

Causal forest are a very nice work. However, they deal with causal
inference, rather than prediction. Hence, I am not really sure how we
could implement them in the API of scikit-learn. Do you have a
suggestion?

Cheers,

Gaël

On Fri, May 24, 2019 at 05:21:50PM -0400, Randy Ellis wrote:
> Would this be difficult for a moderate user to implement in sklearn by
> modifying the existing code base?

> Estimation and Inference of Heterogeneous Treatment Effects using Random
> Forests

> 342 citations in less than a year (Google Scholar): https://
> amstat.tandfonline.com/doi/full/10.1080/01621459.2017.1319839

> "In this article, we develop a nonparametric causal forest for estimating
> heterogeneous treatment effects that extends Breiman’s widely used random
> forest algorithm. In the potential outcomes framework with unconfoundedness, we
> show that causal forests are pointwise consistent for the true treatment effect
> and have an asymptotically Gaussian and centered sampling distribution. We also
> discuss a practical method for constructing asymptotic confidence intervals for
> the true treatment effect that are centered at the causal forest estimates. Our
> theoretical results rely on a generic Gaussian theory for a large family of
> random forest algorithms. To our knowledge, this is the first set of results
> that allows any type of random forest, including classification and regression
> forests, to be used for provably valid statistical inference. In experiments,
> we find causal forests to be substantially more powerful than classical methods
> based on nearest-neighbor matching, especially in the presence of irrelevant
> covariates."
-- 
    Gael Varoquaux
    Senior Researcher, INRIA 
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux