[scikit-learn] Label encoding for classifiers and soft targets

Mon Mar 13 08:35:22 EDT 2017

Hi Giles,

thanks for the suggestion! 

Training a regression tree would require sticking some kind of 
probability normaliser at the end to ensure proper probabilities,
this might somehow hurt sharpness or calibration.

Unfortunately, one of the things I am trying to do
with this is moving away from RF and they humongous 
memory requirements…

Anyway, I think I have a fairly good idea on how to modify
the MLPClassifier to get what I need.

When I get around to do it I’ll drop a line to see if there might be
any interest on pushing the code upstream.

Cheers,
J

> On 13 Mar 2017, at 07:43, Gilles Louppe <g.louppe at gmail.com> wrote:
> 
> Hi Javier,
> 
> In the particular case of tree-based models, you case use the soft
> labels to create a multi-output regression problem, which would yield
> an equivalent classifier (one can show that reduction of variance and
> the gini index would yield the same trees).
> 
> So basically,
> 
> reg = RandomForestRegressor()
> reg.fit(X, encoded_y)
> 
> should work.
> 
> Gilles