[scikit-learn] Missing data and decision trees
drraph at gmail.com
Thu Oct 13 14:33:20 EDT 2016
You can simply make a new binary feature (per feature that might have a
missing value) that is 1 if the value is missing and 0 otherwise. The RF
can then work out what to do with this information.
I don't know how this compares in practice to more sophisticated approaches.
On Thursday, October 13, 2016, Stuart Reynolds <stuart at stuartreynolds.net>
> I'm looking for a decision tree and RF implementation that supports
> missing data (without imputation) -- ideally in Python, Java/Scala or C++.
> It seems that scikit's decision tree algorithm doesn't allow this --
> which is disappointing because its one of the few methods that should be
> able to sensibly handle problems with high amounts of missingness.
> Are there plans to allow missing data in scikit's decision trees?
> Also, is there any particular reason why missing values weren't supported
> originally (e.g. integrates poorly with other features)
> - Stuart
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn