[scikit-learn] Label encoding for classifiers and soft targets
Javier López Peña
jlopez at ende.cc
Mon Mar 13 08:35:22 EDT 2017
Hi Giles,
thanks for the suggestion!
Training a regression tree would require sticking some kind of
probability normaliser at the end to ensure proper probabilities,
this might somehow hurt sharpness or calibration.
Unfortunately, one of the things I am trying to do
with this is moving away from RF and they humongous
memory requirements…
Anyway, I think I have a fairly good idea on how to modify
the MLPClassifier to get what I need.
When I get around to do it I’ll drop a line to see if there might be
any interest on pushing the code upstream.
Cheers,
J
> On 13 Mar 2017, at 07:43, Gilles Louppe <g.louppe at gmail.com> wrote:
>
> Hi Javier,
>
> In the particular case of tree-based models, you case use the soft
> labels to create a multi-output regression problem, which would yield
> an equivalent classifier (one can show that reduction of variance and
> the gini index would yield the same trees).
>
> So basically,
>
> reg = RandomForestRegressor()
> reg.fit(X, encoded_y)
>
> should work.
>
> Gilles
More information about the scikit-learn
mailing list