[scikit-learn] Label encoding for classifiers and soft targets

Mon Mar 13 03:43:29 EDT 2017

Hi Javier,

In the particular case of tree-based models, you case use the soft
labels to create a multi-output regression problem, which would yield
an equivalent classifier (one can show that reduction of variance and
the gini index would yield the same trees).

So basically,

reg = RandomForestRegressor()
reg.fit(X, encoded_y)

should work.

Gilles

On 12 March 2017 at 20:11, Javier López Peña <jlopez at ende.cc> wrote:
>
> On 12 Mar 2017, at 18:38, Gael Varoquaux <gael.varoquaux at normalesup.org>
> wrote:
>
> You can use sample weights to go a bit in this direction. But in general,
> the mathematical meaning of your intuitions will depend on the
> classifier, so they will not be general ways of implementing them without
> a lot of tinkering.
>
>
> I see… to be honest for my purposes it would be enough to bypass the target
> binarization for
> the MLP classifier, so maybe I will just fork my own copy of that class for
> this.
>
> The purpose is two-fold,  on the one hand use the probabilities generated by
> a very complex
> model (e.g. a massive ensemble) to train a simpler one that achieves
> comparable performance at a
> fraction of the cost. Any universal classifier will do (neural networks are
> the prime example).
>
> The second purpose it to use classes probabilities instead of observed
> classes at training time.
> In some problems this helps with model regularization (see section 6 of
> [1])
>
> Cheers,
> J
>
> [1] https://arxiv.org/pdf/1503.02531v1.pdf
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>