Fwd: sample_weight parameter is not split when used in GridSearchCV
Dear all, I posted the full question on StackOverflow and as it contains some figures I refer you to that post. https://stackoverflow.com/questions/44661926/sample- weight-parameter-shape-error-in-scikit-learn-gridsearchcv/44662285#44662285 I currently believe that this issue is a bug and I opened an issue on GitHub. To sum up, the issue is that GridSearchCV does not handle the splitting of the sample_weight vector during cross validation. Nota bene: cross_val_score seems to handle the splitting OK, this issue seems to occurr only in GridSearchCV. Any comments enlightening me and showing me how wrong I am are most welcome.
Hi Manuel, Are you sure that you are using the latest version (or at least >0.17)? The code for splitting the sample weights in GridSearchCV has been there for a while now... -- Julio
El 22 jun 2017, a las 23:33, Manuel Castejón Limas <manuel.castejon@gmail.com> escribió:
Dear all, I posted the full question on StackOverflow and as it contains some figures I refer you to that post.
https://stackoverflow.com/questions/44661926/sample-weight-parameter-shape-e...
I currently believe that this issue is a bug and I opened an issue on GitHub.
To sum up, the issue is that GridSearchCV does not handle the splitting of the sample_weight vector during cross validation.
Nota bene: cross_val_score seems to handle the splitting OK, this issue seems to occurr only in GridSearchCV.
Any comments enlightening me and showing me how wrong I am are most welcome.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
why are you passing [my_sample_weights] rather than just my_sample_weights? On 23 Jun 2017 7:49 am, "Julio Antonio Soto de Vicente" <julio@esbet.es> wrote:
Hi Manuel,
Are you sure that you are using the latest version (or at least >0.17)? The code for splitting the sample weights in GridSearchCV has been there for a while now...
-- Julio
El 22 jun 2017, a las 23:33, Manuel Castejón Limas < manuel.castejon@gmail.com> escribió:
Dear all, I posted the full question on StackOverflow and as it contains some figures I refer you to that post.
https://stackoverflow.com/questions/44661926/sample-weight- parameter-shape-error-in-scikit-learn-gridsearchcv/44662285#44662285
I currently believe that this issue is a bug and I opened an issue on GitHub.
To sum up, the issue is that GridSearchCV does not handle the splitting of the sample_weight vector during cross validation.
Nota bene: cross_val_score seems to handle the splitting OK, this issue seems to occurr only in GridSearchCV.
Any comments enlightening me and showing me how wrong I am are most welcome.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hello Antonio, Sure: import sklearn print(sklearn.__version__) 0.18.1 The error suggests that the fit function is expecting a split vector with size 2/3*1000 but the whole vector (size 1000) is passed. ... ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast. El 22 jun. 2017 11:49 p. m., "Julio Antonio Soto de Vicente" <julio@esbet.es> escribió: Hi Manuel, Are you sure that you are using the latest version (or at least >0.17)? The code for splitting the sample weights in GridSearchCV has been there for a while now... -- Julio El 22 jun 2017, a las 23:33, Manuel Castejón Limas < manuel.castejon@gmail.com> escribió: Dear all, I posted the full question on StackOverflow and as it contains some figures I refer you to that post. https://stackoverflow.com/questions/44661926/sample-weight- parameter-shape-error-in-scikit-learn-gridsearchcv/44662285#44662285 I currently believe that this issue is a bug and I opened an issue on GitHub. To sum up, the issue is that GridSearchCV does not handle the splitting of the sample_weight vector during cross validation. Nota bene: cross_val_score seems to handle the splitting OK, this issue seems to occurr only in GridSearchCV. Any comments enlightening me and showing me how wrong I am are most welcome. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Joel, I'm just passing an iterable as I would do with any other sequence of parameters to tune. In this case the list only has one element to use but in general I ought to be able to pass a collection of vectors. Anyway, I guess that that issue is not the cause of the problem. El 23 jun. 2017 1:04 a. m., "Joel Nothman" <joel.nothman@gmail.com> escribió:
why are you passing [my_sample_weights] rather than just my_sample_weights?
On 23 Jun 2017 7:49 am, "Julio Antonio Soto de Vicente" <julio@esbet.es> wrote:
Hi Manuel,
Are you sure that you are using the latest version (or at least >0.17)? The code for splitting the sample weights in GridSearchCV has been there for a while now...
-- Julio
El 22 jun 2017, a las 23:33, Manuel Castejón Limas < manuel.castejon@gmail.com> escribió:
Dear all, I posted the full question on StackOverflow and as it contains some figures I refer you to that post.
https://stackoverflow.com/questions/44661926/sample-weight-p arameter-shape-error-in-scikit-learn-gridsearchcv/44662285#44662285
I currently believe that this issue is a bug and I opened an issue on GitHub.
To sum up, the issue is that GridSearchCV does not handle the splitting of the sample_weight vector during cross validation.
Nota bene: cross_val_score seems to handle the splitting OK, this issue seems to occurr only in GridSearchCV.
Any comments enlightening me and showing me how wrong I am are most welcome.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Joel, I tried and removed the square brackets and now it works as expected *for a single* sample_weight vector: validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], }, fit_params={'sample_weight': my_sample_weights }, n_jobs=1, ) validator.fit(x, y) The problem now is that I want to try multiple trainings with multiple sample_weight parameters, in the following fashion: validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], 'sample_weight': [my_sample_weights, my_sample_weights**2] , }, fit_params={}, n_jobs=1, ) validator.fit(x, y) But unfortunately it produces the same error again: ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast. I guess that the issue is that the sample__weight parameter was not thought to be changed during the tuning, was it? Thank you all for your patience and support. Best Manolo 2017-06-23 1:17 GMT+02:00 Manuel CASTEJÓN LIMAS <mcasl@unileon.es>:
Dear Joel, I'm just passing an iterable as I would do with any other sequence of parameters to tune. In this case the list only has one element to use but in general I ought to be able to pass a collection of vectors. Anyway, I guess that that issue is not the cause of the problem.
El 23 jun. 2017 1:04 a. m., "Joel Nothman" <joel.nothman@gmail.com> escribió:
why are you passing [my_sample_weights] rather than just my_sample_weights?
yes, trying multiple sample weightings is not supported by grid search directly. On 23 Jun 2017 6:36 pm, "Manuel Castejón Limas" <manuel.castejon@gmail.com> wrote:
Dear Joel,
I tried and removed the square brackets and now it works as expected *for a single* sample_weight vector:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], }, fit_params={'sample_weight': my_sample_weights }, n_jobs=1, ) validator.fit(x, y)
The problem now is that I want to try multiple trainings with multiple sample_weight parameters, in the following fashion:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], 'sample_weight': [my_sample_weights, my_sample_weights**2] , }, fit_params={}, n_jobs=1, ) validator.fit(x, y)
But unfortunately it produces the same error again:
ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast.
I guess that the issue is that the sample__weight parameter was not thought to be changed during the tuning, was it?
Thank you all for your patience and support. Best Manolo
2017-06-23 1:17 GMT+02:00 Manuel CASTEJÓN LIMAS <mcasl@unileon.es>:
Dear Joel, I'm just passing an iterable as I would do with any other sequence of parameters to tune. In this case the list only has one element to use but in general I ought to be able to pass a collection of vectors. Anyway, I guess that that issue is not the cause of the problem.
El 23 jun. 2017 1:04 a. m., "Joel Nothman" <joel.nothman@gmail.com> escribió:
why are you passing [my_sample_weights] rather than just my_sample_weights?
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Joel is right. In fact, you usually don't want to tune a lot the sample weights: you may leave them default, set them in order to balance classes, or fix them according to some business rule. That said, you can always run a couple of grid searchs changing that sample weights and compare results afterwards. -- Julio
El 24 jun 2017, a las 15:51, Joel Nothman <joel.nothman@gmail.com> escribió:
yes, trying multiple sample weightings is not supported by grid search directly.
On 23 Jun 2017 6:36 pm, "Manuel Castejón Limas" <manuel.castejon@gmail.com> wrote: Dear Joel,
I tried and removed the square brackets and now it works as expected for a single sample_weight vector:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], }, fit_params={'sample_weight': my_sample_weights }, n_jobs=1, ) validator.fit(x, y) The problem now is that I want to try multiple trainings with multiple sample_weight parameters, in the following fashion:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], 'sample_weight': [my_sample_weights, my_sample_weights**2] , }, fit_params={}, n_jobs=1, ) validator.fit(x, y) But unfortunately it produces the same error again:
ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast.
I guess that the issue is that the sample__weight parameter was not thought to be changed during the tuning, was it?
Thank you all for your patience and support. Best Manolo
2017-06-23 1:17 GMT+02:00 Manuel CASTEJÓN LIMAS <mcasl@unileon.es>:
Dear Joel, I'm just passing an iterable as I would do with any other sequence of parameters to tune. In this case the list only has one element to use but in general I ought to be able to pass a collection of vectors. Anyway, I guess that that issue is not the cause of the problem.
El 23 jun. 2017 1:04 a. m., "Joel Nothman" <joel.nothman@gmail.com> escribió:
why are you passing [my_sample_weights] rather than just my_sample_weights?
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Yes, I guess most users will be happy without using weights. Some will need to use one single vector, but I am currently researching a weighting method thus my need of evaluating multiple weight vectors. I understand that it seems to be a very specific issue with a simple workaround, most likely not worthy of any programming effort yet as there are more important issues to address. I guess that adding a note on this behaviour on the documentation could be great. If some parameters can be iterated and others are not supported knowing it provides a more solid ground to the user base. I'm committed to spend a few hours studying the code. Should I be successful I will come again with a pull request. I'll cross my fingers :-) Best Manolo El 24 jun. 2017 20:05, "Julio Antonio Soto de Vicente" <julio@esbet.es> escribió: Joel is right. In fact, you usually don't want to tune a lot the sample weights: you may leave them default, set them in order to balance classes, or fix them according to some business rule. That said, you can always run a couple of grid searchs changing that sample weights and compare results afterwards. -- Julio El 24 jun 2017, a las 15:51, Joel Nothman <joel.nothman@gmail.com> escribió: yes, trying multiple sample weightings is not supported by grid search directly. On 23 Jun 2017 6:36 pm, "Manuel Castejón Limas" <manuel.castejon@gmail.com> wrote:
Dear Joel,
I tried and removed the square brackets and now it works as expected *for a single* sample_weight vector:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], }, fit_params={'sample_weight': my_sample_weights }, n_jobs=1, ) validator.fit(x, y)
The problem now is that I want to try multiple trainings with multiple sample_weight parameters, in the following fashion:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], 'sample_weight': [my_sample_weights, my_sample_weights**2] , }, fit_params={}, n_jobs=1, ) validator.fit(x, y)
But unfortunately it produces the same error again:
ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast.
I guess that the issue is that the sample__weight parameter was not thought to be changed during the tuning, was it?
Thank you all for your patience and support. Best Manolo
2017-06-23 1:17 GMT+02:00 Manuel CASTEJÓN LIMAS <mcasl@unileon.es>:
Dear Joel, I'm just passing an iterable as I would do with any other sequence of parameters to tune. In this case the list only has one element to use but in general I ought to be able to pass a collection of vectors. Anyway, I guess that that issue is not the cause of the problem.
El 23 jun. 2017 1:04 a. m., "Joel Nothman" <joel.nothman@gmail.com> escribió:
why are you passing [my_sample_weights] rather than just my_sample_weights?
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I don't think we'll be accepting a pull request adding this feature to scikit-learn. It is too niche. But you should go ahead and modify the search to operate over weightings for your own research. If you feel the documentation can be clarified, a pull request there is welcome. On 26 June 2017 at 16:43, Manuel CASTEJÓN LIMAS <mcasl@unileon.es> wrote:
Yes, I guess most users will be happy without using weights. Some will need to use one single vector, but I am currently researching a weighting method thus my need of evaluating multiple weight vectors.
I understand that it seems to be a very specific issue with a simple workaround, most likely not worthy of any programming effort yet as there are more important issues to address.
I guess that adding a note on this behaviour on the documentation could be great. If some parameters can be iterated and others are not supported knowing it provides a more solid ground to the user base.
I'm committed to spend a few hours studying the code. Should I be successful I will come again with a pull request. I'll cross my fingers :-) Best Manolo
El 24 jun. 2017 20:05, "Julio Antonio Soto de Vicente" <julio@esbet.es> escribió:
Joel is right.
In fact, you usually don't want to tune a lot the sample weights: you may leave them default, set them in order to balance classes, or fix them according to some business rule.
That said, you can always run a couple of grid searchs changing that sample weights and compare results afterwards.
-- Julio
El 24 jun 2017, a las 15:51, Joel Nothman <joel.nothman@gmail.com> escribió:
yes, trying multiple sample weightings is not supported by grid search directly.
On 23 Jun 2017 6:36 pm, "Manuel Castejón Limas" <manuel.castejon@gmail.com> wrote:
Dear Joel,
I tried and removed the square brackets and now it works as expected *for a single* sample_weight vector:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], }, fit_params={'sample_weight': my_sample_weights }, n_jobs=1, ) validator.fit(x, y)
The problem now is that I want to try multiple trainings with multiple sample_weight parameters, in the following fashion:
validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], 'sample_weight': [my_sample_weights, my_sample_weights**2] , }, fit_params={}, n_jobs=1, ) validator.fit(x, y)
But unfortunately it produces the same error again:
ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast.
I guess that the issue is that the sample__weight parameter was not thought to be changed during the tuning, was it?
Thank you all for your patience and support. Best Manolo
2017-06-23 1:17 GMT+02:00 Manuel CASTEJÓN LIMAS <mcasl@unileon.es>:
Dear Joel, I'm just passing an iterable as I would do with any other sequence of parameters to tune. In this case the list only has one element to use but in general I ought to be able to pass a collection of vectors. Anyway, I guess that that issue is not the cause of the problem.
El 23 jun. 2017 1:04 a. m., "Joel Nothman" <joel.nothman@gmail.com> escribió:
why are you passing [my_sample_weights] rather than just my_sample_weights?
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
We could clarify in the documentation that you can grid-search any (hyper) parameter of a model, but not parameters to fit? Only the values returned by get_params() can be tuned. Only "param_grid" will be searched, not "fit_params". "fit_params" can contain only a single setting. On 06/26/2017 03:17 AM, Joel Nothman wrote:
I don't think we'll be accepting a pull request adding this feature to scikit-learn. It is too niche. But you should go ahead and modify the search to operate over weightings for your own research. If you feel the documentation can be clarified, a pull request there is welcome.
On 26 June 2017 at 16:43, Manuel CASTEJÓN LIMAS <mcasl@unileon.es <mailto:mcasl@unileon.es>> wrote:
Yes, I guess most users will be happy without using weights. Some will need to use one single vector, but I am currently researching a weighting method thus my need of evaluating multiple weight vectors.
I understand that it seems to be a very specific issue with a simple workaround, most likely not worthy of any programming effort yet as there are more important issues to address.
I guess that adding a note on this behaviour on the documentation could be great. If some parameters can be iterated and others are not supported knowing it provides a more solid ground to the user base.
I'm committed to spend a few hours studying the code. Should I be successful I will come again with a pull request. I'll cross my fingers :-) Best Manolo
El 24 jun. 2017 20:05, "Julio Antonio Soto de Vicente" <julio@esbet.es <mailto:julio@esbet.es>> escribió:
Joel is right.
In fact, you usually don't want to tune a lot the sample weights: you may leave them default, set them in order to balance classes, or fix them according to some business rule.
That said, you can always run a couple of grid searchs changing that sample weights and compare results afterwards.
-- Julio
El 24 jun 2017, a las 15:51, Joel Nothman <joel.nothman@gmail.com <mailto:joel.nothman@gmail.com>> escribió:
yes, trying multiple sample weightings is not supported by grid search directly.
On 23 Jun 2017 6:36 pm, "Manuel Castejón Limas" <manuel.castejon@gmail.com <mailto:manuel.castejon@gmail.com>> wrote:
Dear Joel,
I tried and removed the square brackets and now it works as expected *for a single* sample_weight vector:
|validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], }, fit_params={'sample_weight': my_sample_weights }, n_jobs=1, ) validator.fit(x, y)|
The problem now is that I want to try multiple trainings with multiple sample_weight parameters, in the following fashion:
|validator = GridSearchCV(my_Regressor, param_grid={'number_of_hidden_neurons': range(4, 5), 'epochs': [50], 'sample_weight': [my_sample_weights, my_sample_weights**2] , }, fit_params={}, n_jobs=1, ) validator.fit(x, y)|
But unfortunately it produces the same error again:
ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast.
I guess that the issue is that the sample__weight parameter was not thought to be changed during the tuning, was it?
Thank you all for your patience and support. Best Manolo
2017-06-23 1:17 GMT+02:00 Manuel CASTEJÓN LIMAS <mcasl@unileon.es <mailto:mcasl@unileon.es>>:
Dear Joel, I'm just passing an iterable as I would do with any other sequence of parameters to tune. In this case the list only has one element to use but in general I ought to be able to pass a collection of vectors. Anyway, I guess that that issue is not the cause of the problem.
El 23 jun. 2017 1:04 a. m., "Joel Nothman" <joel.nothman@gmail.com <mailto:joel.nothman@gmail.com>> escribió:
why are you passing [my_sample_weights] rather than just my_sample_weights?
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (5)
-
Andreas Mueller -
Joel Nothman -
Julio Antonio Soto de Vicente -
Manuel Castejón Limas -
Manuel CASTEJÓN LIMAS