[scikit-learn] Missing data and decision trees

Raphael C drraph at gmail.com
Thu Oct 13 14:33:20 EDT 2016


You can simply make a new binary feature (per feature that might have a
missing value) that is 1 if the value is missing and 0 otherwise.  The RF
can then work out what to do with this information.

I don't know how this compares in practice to more sophisticated approaches.

Raphael

On Thursday, October 13, 2016, Stuart Reynolds <stuart at stuartreynolds.net>
wrote:

> I'm looking for a decision tree and RF implementation that supports
> missing data (without imputation) -- ideally in Python, Java/Scala or C++.
>
> It seems that scikit's decision tree algorithm doesn't allow this --
> which is disappointing because its one of the few methods that should be
> able to sensibly handle problems with high amounts of missingness.
>
> Are there plans to allow missing data in scikit's decision trees?
>
> Also, is there any particular reason why missing values weren't supported
> originally (e.g. integrates poorly with other features)
>
> Regards
> - Stuart
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161013/a82aea37/attachment.html>


More information about the scikit-learn mailing list