[scikit-learn] Missing data and decision trees
Jacob Schreiber
jmschreiber91 at gmail.com
Thu Oct 13 14:20:34 EDT 2016
I think Raghav is working on it in this PR:
https://github.com/scikit-learn/scikit-learn/pull/5974
The reason they weren't initially supported is likely that it involves a
lot of work and design choices to handle missing values appropriately, and
the discussion on the best way to handle it was postponed until there was a
working estimator which could serve most peoples needs.
On Thu, Oct 13, 2016 at 11:14 AM, Stuart Reynolds <stuart at stuartreynolds.net
> wrote:
> I'm looking for a decision tree and RF implementation that supports
> missing data (without imputation) -- ideally in Python, Java/Scala or C++.
>
> It seems that scikit's decision tree algorithm doesn't allow this --
> which is disappointing because its one of the few methods that should be
> able to sensibly handle problems with high amounts of missingness.
>
> Are there plans to allow missing data in scikit's decision trees?
>
> Also, is there any particular reason why missing values weren't supported
> originally (e.g. integrates poorly with other features)
>
> Regards
> - Stuart
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161013/b1c5ed0a/attachment.html>
More information about the scikit-learn
mailing list