Label encoding for classifiers and soft targets
Hi there! I have been recently experimenting with model regularization through the use of soft targets, and I’d like to be able to play with that from sklearn. The main idea is as follows: imagine I want to fit a (probabilisitic) classifier with three possible targets, 0, 1, 2 If I pass my training set (X, y) to a sklearn classifier, the target vector y gets encoded so that each target becomes an array, [1, 0, 0], [0, 1, 0], or [0, 0, 1] What I would like to do is to be able to pass the targets directly in the encoded form, and avoid any further encoding. This allows for instance to pass targets as [0.9, 0.5, 0.5] if I want to prevent my classifier from becoming too opinionated on its predicted probabilities. Ideally I would like to do something like this: ``` clf = SomeClassifier(*parameters, encode_targets=False) ``` and then call ``` elf.fit(X, encoded_y) ``` Would it be simple to modify sklearn code to do this, or would it require a lot of tinkering such as modifying every single classifier under the sun? Cheers, J
Would it be simple to modify sklearn code to do this, or would it require a lot of tinkering such as modifying every single classifier under the sun?
You can use sample weights to go a bit in this direction. But in general, the mathematical meaning of your intuitions will depend on the classifier, so they will not be general ways of implementing them without a lot of tinkering.
On 12 Mar 2017, at 18:38, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
You can use sample weights to go a bit in this direction. But in general, the mathematical meaning of your intuitions will depend on the classifier, so they will not be general ways of implementing them without a lot of tinkering.
I see… to be honest for my purposes it would be enough to bypass the target binarization for the MLP classifier, so maybe I will just fork my own copy of that class for this. The purpose is two-fold, on the one hand use the probabilities generated by a very complex model (e.g. a massive ensemble) to train a simpler one that achieves comparable performance at a fraction of the cost. Any universal classifier will do (neural networks are the prime example). The second purpose it to use classes probabilities instead of observed classes at training time. In some problems this helps with model regularization (see section 6 of [1]) Cheers, J [1] https://arxiv.org/pdf/1503.02531v1.pdf <https://arxiv.org/pdf/1503.02531v1.pdf>
Hi Javier, In the particular case of tree-based models, you case use the soft labels to create a multi-output regression problem, which would yield an equivalent classifier (one can show that reduction of variance and the gini index would yield the same trees). So basically, reg = RandomForestRegressor() reg.fit(X, encoded_y) should work. Gilles On 12 March 2017 at 20:11, Javier López Peña <jlopez@ende.cc> wrote:
On 12 Mar 2017, at 18:38, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
You can use sample weights to go a bit in this direction. But in general, the mathematical meaning of your intuitions will depend on the classifier, so they will not be general ways of implementing them without a lot of tinkering.
I see… to be honest for my purposes it would be enough to bypass the target binarization for the MLP classifier, so maybe I will just fork my own copy of that class for this.
The purpose is two-fold, on the one hand use the probabilities generated by a very complex model (e.g. a massive ensemble) to train a simpler one that achieves comparable performance at a fraction of the cost. Any universal classifier will do (neural networks are the prime example).
The second purpose it to use classes probabilities instead of observed classes at training time. In some problems this helps with model regularization (see section 6 of [1])
Cheers, J
[1] https://arxiv.org/pdf/1503.02531v1.pdf
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Giles, thanks for the suggestion! Training a regression tree would require sticking some kind of probability normaliser at the end to ensure proper probabilities, this might somehow hurt sharpness or calibration. Unfortunately, one of the things I am trying to do with this is moving away from RF and they humongous memory requirements… Anyway, I think I have a fairly good idea on how to modify the MLPClassifier to get what I need. When I get around to do it I’ll drop a line to see if there might be any interest on pushing the code upstream. Cheers, J
On 13 Mar 2017, at 07:43, Gilles Louppe <g.louppe@gmail.com> wrote:
Hi Javier,
In the particular case of tree-based models, you case use the soft labels to create a multi-output regression problem, which would yield an equivalent classifier (one can show that reduction of variance and the gini index would yield the same trees).
So basically,
reg = RandomForestRegressor() reg.fit(X, encoded_y)
should work.
Gilles
On 03/13/2017 08:35 AM, Javier López Peña wrote:
Training a regression tree would require sticking some kind of probability normaliser at the end to ensure proper probabilities, this might somehow hurt sharpness or calibration. No, if all the samples are normalized and your aggregation function is sane (like the mean), the output will also be normalized.
On 13 Mar 2017, at 21:18, Andreas Mueller <t3kcit@gmail.com> wrote:
No, if all the samples are normalized and your aggregation function is sane (like the mean), the output will also be normalised.
You are completely right, I hadn’t checked this for random forests. Still, my purpose is to reduce model complexity, and RF require too much memory to be used in my production environment.
On 03/12/2017 03:11 PM, Javier López Peña wrote:
The purpose is two-fold, on the one hand use the probabilities generated by a very complex model (e.g. a massive ensemble) to train a simpler one that achieves comparable performance at a fraction of the cost. Any universal classifier will do (neural networks are the prime example). You could use a regression model with a logistic sigmoid in the output layer.
On 03/13/2017 05:54 PM, Javier López Peña wrote:
You could use a regression model with a logistic sigmoid in the output layer. By training a regression network with logistic activation the outputs do not add to 1. I just checked on a minimal example on the iris dataset. Sorry meant softmax ;)
participants (4)
-
Andreas Mueller -
Gael Varoquaux -
Gilles Louppe -
Javier López Peña