logistic regression results are not stable between solvers
Dear scikit-learn users, I am using logistic regression to make some predictions. On my own data, I do not get the same results between solvers. I managed to reproduce this issue on synthetic data (see the code below). All solvers seem to converge (n_iter_ < max_iter), so why do I get different results? If results between solvers are not stable, which one to choose? Best regards, Ben ------------------------------------------ Here is the code I used to generate synthetic data: from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=45, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='lbfgs') classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='saga') classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) # y_pred_lbfgs = classifier_lbfgs.predict(X_split_test) y_pred_saga = classifier_saga.predict(X_split_test) # if (y_pred_lbfgs==y_pred_saga).all() == False: print('lbfgs does not give the same results as saga :-( !') exit()
I'm pretty sure SAGA is not converging. Unless you scale the data, SAGA is very slow to converge. On 10/8/19 7:19 PM, Benoît Presles wrote:
Dear scikit-learn users,
I am using logistic regression to make some predictions. On my own data, I do not get the same results between solvers. I managed to reproduce this issue on synthetic data (see the code below). All solvers seem to converge (n_iter_ < max_iter), so why do I get different results? If results between solvers are not stable, which one to choose?
Best regards, Ben
------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=45, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='lbfgs') classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='saga') classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) # y_pred_lbfgs = classifier_lbfgs.predict(X_split_test) y_pred_saga = classifier_saga.predict(X_split_test) # if (y_pred_lbfgs==y_pred_saga).all() == False: print('lbfgs does not give the same results as saga :-( !') exit()
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
As you can notice in the code below, I do scale the data. I do not get any convergence warning and moreover I always have n_iter_ < max_iter.
Le 8 oct. 2019 à 19:51, Andreas Mueller <t3kcit@gmail.com> a écrit :
I'm pretty sure SAGA is not converging. Unless you scale the data, SAGA is very slow to converge.
On 10/8/19 7:19 PM, Benoît Presles wrote: Dear scikit-learn users,
I am using logistic regression to make some predictions. On my own data, I do not get the same results between solvers. I managed to reproduce this issue on synthetic data (see the code below). All solvers seem to converge (n_iter_ < max_iter), so why do I get different results? If results between solvers are not stable, which one to choose?
Best regards, Ben
------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=45, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='lbfgs') classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='saga') classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) # y_pred_lbfgs = classifier_lbfgs.predict(X_split_test) y_pred_saga = classifier_saga.predict(X_split_test) # if (y_pred_lbfgs==y_pred_saga).all() == False: print('lbfgs does not give the same results as saga :-( !') exit()
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear scikit-learn users, Do you think it is a bug in scikit-learn? Best regards, Ben Le 08/10/2019 à 20:19, Benoît Presles a écrit :
As you can notice in the code below, I do scale the data. I do not get any convergence warning and moreover I always have n_iter_ < max_iter.
Le 8 oct. 2019 à 19:51, Andreas Mueller <t3kcit@gmail.com> a écrit :
I'm pretty sure SAGA is not converging. Unless you scale the data, SAGA is very slow to converge.
On 10/8/19 7:19 PM, Benoît Presles wrote: Dear scikit-learn users,
I am using logistic regression to make some predictions. On my own data, I do not get the same results between solvers. I managed to reproduce this issue on synthetic data (see the code below). All solvers seem to converge (n_iter_ < max_iter), so why do I get different results? If results between solvers are not stable, which one to choose?
Best regards, Ben
------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=45, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='lbfgs') classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='saga') classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) # y_pred_lbfgs = classifier_lbfgs.predict(X_split_test) y_pred_saga = classifier_saga.predict(X_split_test) # if (y_pred_lbfgs==y_pred_saga).all() == False: print('lbfgs does not give the same results as saga :-( !') exit()
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Could you generate more samples, set penalty to none, reduce the tolerance and check the coefficients instead of predictions. This is sure to be sure that this is not only a numerical error. Sent from my phone - sorry to be brief and potential misspell. Original Message From: benoit.presles@u-bourgogne.fr Sent: 8 October 2019 20:27 To: scikit-learn@python.org Reply to: scikit-learn@python.org Subject: [scikit-learn] logistic regression results are not stable between solvers Dear scikit-learn users, I am using logistic regression to make some predictions. On my own data, I do not get the same results between solvers. I managed to reproduce this issue on synthetic data (see the code below). All solvers seem to converge (n_iter_ < max_iter), so why do I get different results? If results between solvers are not stable, which one to choose? Best regards, Ben ------------------------------------------ Here is the code I used to generate synthetic data: from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=45, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='lbfgs') classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='saga') classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) # y_pred_lbfgs = classifier_lbfgs.predict(X_split_test) y_pred_saga = classifier_saga.predict(X_split_test) # if (y_pred_lbfgs==y_pred_saga).all() == False: print('lbfgs does not give the same results as saga :-( !') exit() _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear scikit-learn users, I did what you suggested (see code below) and I still do not get the same results between solvers. I do not have the same predictions and I do not have the same coefficients. Best regards, Ben Here is the new source code: from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=400, n_features=45, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='lbfgs', penalty='none', tol=1e-6) classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) print(classifier_lbfgs.coef_) classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='saga', penalty='none', tol=1e-6) classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) print(classifier_saga.coef_) # y_pred_lbfgs = classifier_lbfgs.predict(X_split_test) y_pred_saga = classifier_saga.predict(X_split_test) # if (y_pred_lbfgs==y_pred_saga).all() == False: print('lbfgs does not give the same results as saga :-( !') exit(1) Le 09/10/2019 à 20:25, Guillaume Lemaître a écrit :
Could you generate more samples, set penalty to none, reduce the tolerance and check the coefficients instead of predictions. This is sure to be sure that this is not only a numerical error.
Sent from my phone - sorry to be brief and potential misspell.
Original Message
From: benoit.presles@u-bourgogne.fr Sent: 8 October 2019 20:27 To: scikit-learn@python.org Reply to: scikit-learn@python.org Subject: [scikit-learn] logistic regression results are not stable between solvers
Dear scikit-learn users,
I am using logistic regression to make some predictions. On my own data, I do not get the same results between solvers. I managed to reproduce this issue on synthetic data (see the code below). All solvers seem to converge (n_iter_ < max_iter), so why do I get different results? If results between solvers are not stable, which one to choose?
Best regards, Ben
------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=45, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='lbfgs') classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9, solver='saga') classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) # y_pred_lbfgs = classifier_lbfgs.predict(X_split_test) y_pred_saga = classifier_saga.predict(X_split_test) # if (y_pred_lbfgs==y_pred_saga).all() == False: print('lbfgs does not give the same results as saga :-( !') exit()
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
The predictions across solver are exactly the same when I run the code. I am using 0.21.3 version. What is yours? In [13]: import sklearn In [14]: sklearn.__version__ Out[14]: '0.21.3' Serafeim On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr<mailto:benoit.presles@u-bourgogne.fr>> wrote: (y_pred_lbfgs==y_pred_saga).all() == False
Ben, I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity. Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance). You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge. Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why. -- Roman On 09/10/2019 22:10, serafim loukas wrote:
The predictions across solver are exactly the same when I run the code. I am using 0.21.3 version. What is yours?
In [13]: import sklearn
In [14]: sklearn.__version__ Out[14]: '0.21.3'
Serafeim
On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
(y_pred_lbfgs==y_pred_saga).all() == False
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I slightly change the bench such that it uses pipeline and plotted the coefficient: https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386 I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability. Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not. On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote:
The predictions across solver are exactly the same when I run the code. I am using 0.21.3 version. What is yours?
In [13]: import sklearn
In [14]: sklearn.__version__ Out[14]: '0.21.3'
Serafeim
On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
(y_pred_lbfgs==y_pred_saga).all() == False
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably. On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote:
The predictions across solver are exactly the same when I run the code. I am using 0.21.3 version. What is yours?
In [13]: import sklearn
In [14]: sklearn.__version__ Out[14]: '0.21.3'
Serafeim
On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
(y_pred_lbfgs==y_pred_saga).all() == False
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
Ups I did not see the answer of Roman. Sorry about that. It is coming back to the same conclusion :) On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably.
On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote:
The predictions across solver are exactly the same when I run the code. I am using 0.21.3 version. What is yours?
In [13]: import sklearn
In [14]: sklearn.__version__ Out[14]: '0.21.3'
Serafeim
On 9 Oct 2019, at 21:44, Benoît Presles < benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
(y_pred_lbfgs==y_pred_saga).all() == False
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
On 10/10/19 1:14 PM, Benoît Presles wrote:
Thanks for your answers.
On my real data, I do not have so many samples. I have a bit more than 200 samples in total and I also would like to get some results with unpenalized logisitic regression. What do you suggest? Should I switch to the lbfgs solver?
Yes.
Am I sure that with this solver I will not have any convergence issue and always get the good result? Indeed, I did not get any convergence warning with saga, so I thought everything was fine. I noticed some issues only when I decided to test several solvers. Without comparing the results across solvers, how to be sure that the optimisation goes well? Shouldn't scikit-learn warn the user somehow if it is not the case? We should attempt to warn in the SAGA solver if it doesn't converge. That it doesn't raise a convergence warning should probably be considered a bug. It uses the maximum weight change as a stopping criterion right now. We could probably compute the dual objective once in the end to see if we converged, right? Or is that not possible with SAGA? If not, we might want to caution that no convergence warning will be raised.
At last, I was using saga because I also wanted to do some feature selection by using l1 penalty which is not supported by lbfgs...
You can use liblinear then.
Best regards, Ben
Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
Ups I did not see the answer of Roman. Sorry about that. It is coming back to the same conclusion :)
On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably.
On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com <mailto:rth.yurchak@gmail.com>> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote: > The predictions across solver are exactly the same when I run the code. > I am using 0.21.3 version. What is yours? > > > In [13]: import sklearn > > In [14]: sklearn.__version__ > Out[14]: '0.21.3' > > > Serafeim > > > >> On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr> >> <mailto:benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>>> wrote: >> >> (y_pred_lbfgs==y_pred_saga).all() == False > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear sklearn users, I still have some issues concerning logistic regression. I did compare on the same data (simulated data) sklearn with three different solvers (lbfgs, saga, liblinear) and statsmodels. When everything goes well, I get the same results between lbfgs, saga, liblinear and statsmodels. When everything goes wrong, all the results are different. In fact, when everything goes wrong, statsmodels gives me a convergence warning (Warning: Maximum number of iterations has been exceeded. Current function value: inf Iterations: 20000) + an error (numpy.linalg.LinAlgError: Singular matrix). Why sklearn does not tell me anything? How can I know that I have convergence issues with sklearn? Thanks for your help, Best regards, Ben -------------------------------------------- Here is the code I used to generate synthetic data: from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression import statsmodels.api as sm # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=20, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='lbfgs', penalty='none', tol=1e-6) classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) print(classifier_lbfgs.intercept_) print(classifier_lbfgs.coef_) # classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='saga', penalty='none', tol=1e-6) classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) print(classifier_saga.intercept_) print(classifier_saga.coef_) # classifier_liblinear = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='liblinear', penalty='l2', tol=1e-6) classifier_liblinear.fit(X_split_train, y_split_train) print('classifier liblinear iter:', classifier_liblinear.n_iter_) print(classifier_liblinear.intercept_) print(classifier_liblinear.coef_) # statsmodels logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train)) logit_res = logit.fit(maxiter=20000) print("Coef statsmodels") print(logit_res.params) On 11/10/2019 15:42, Andreas Mueller wrote:
On 10/10/19 1:14 PM, Benoît Presles wrote:
Thanks for your answers.
On my real data, I do not have so many samples. I have a bit more than 200 samples in total and I also would like to get some results with unpenalized logisitic regression. What do you suggest? Should I switch to the lbfgs solver?
Yes.
Am I sure that with this solver I will not have any convergence issue and always get the good result? Indeed, I did not get any convergence warning with saga, so I thought everything was fine. I noticed some issues only when I decided to test several solvers. Without comparing the results across solvers, how to be sure that the optimisation goes well? Shouldn't scikit-learn warn the user somehow if it is not the case? We should attempt to warn in the SAGA solver if it doesn't converge. That it doesn't raise a convergence warning should probably be considered a bug. It uses the maximum weight change as a stopping criterion right now. We could probably compute the dual objective once in the end to see if we converged, right? Or is that not possible with SAGA? If not, we might want to caution that no convergence warning will be raised.
At last, I was using saga because I also wanted to do some feature selection by using l1 penalty which is not supported by lbfgs...
You can use liblinear then.
Best regards, Ben
Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
Ups I did not see the answer of Roman. Sorry about that. It is coming back to the same conclusion :)
On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably.
On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com <mailto:rth.yurchak@gmail.com>> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote: > The predictions across solver are exactly the same when I run the code. > I am using 0.21.3 version. What is yours? > > > In [13]: import sklearn > > In [14]: sklearn.__version__ > Out[14]: '0.21.3' > > > Serafeim > > > >> On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr> >> <mailto:benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>>> wrote: >> >> (y_pred_lbfgs==y_pred_saga).all() == False > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
We issue convergence warning. Can you check n_iter to be sure that you did not convergence to the stated convergence? On Wed, 8 Jan 2020 at 20:53, Benoît Presles <benoit.presles@u-bourgogne.fr> wrote:
Dear sklearn users,
I still have some issues concerning logistic regression. I did compare on the same data (simulated data) sklearn with three different solvers (lbfgs, saga, liblinear) and statsmodels.
When everything goes well, I get the same results between lbfgs, saga, liblinear and statsmodels. When everything goes wrong, all the results are different.
In fact, when everything goes wrong, statsmodels gives me a convergence warning (Warning: Maximum number of iterations has been exceeded. Current function value: inf Iterations: 20000) + an error (numpy.linalg.LinAlgError: Singular matrix).
Why sklearn does not tell me anything? How can I know that I have convergence issues with sklearn?
Thanks for your help, Best regards, Ben
--------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression import statsmodels.api as sm # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=20, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='lbfgs', penalty='none', tol=1e-6) classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) print(classifier_lbfgs.intercept_) print(classifier_lbfgs.coef_) # classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='saga', penalty='none', tol=1e-6) classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) print(classifier_saga.intercept_) print(classifier_saga.coef_) # classifier_liblinear = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='liblinear', penalty='l2', tol=1e-6) classifier_liblinear.fit(X_split_train, y_split_train) print('classifier liblinear iter:', classifier_liblinear.n_iter_) print(classifier_liblinear.intercept_) print(classifier_liblinear.coef_) # statsmodels logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train)) logit_res = logit.fit(maxiter=20000) print("Coef statsmodels") print(logit_res.params)
On 11/10/2019 15:42, Andreas Mueller wrote:
On 10/10/19 1:14 PM, Benoît Presles wrote:
Thanks for your answers. On my real data, I do not have so many samples. I have a bit more than 200 samples in total and I also would like to get some results with unpenalized logisitic regression. What do you suggest? Should I switch to the lbfgs solver?
Yes.
Am I sure that with this solver I will not have any convergence issue and always get the good result? Indeed, I did not get any convergence warning with saga, so I thought everything was fine. I noticed some issues only when I decided to test several solvers. Without comparing the results across solvers, how to be sure that the optimisation goes well? Shouldn't scikit-learn warn the user somehow if it is not the case?
We should attempt to warn in the SAGA solver if it doesn't converge. That it doesn't raise a convergence warning should probably be considered a bug. It uses the maximum weight change as a stopping criterion right now. We could probably compute the dual objective once in the end to see if we converged, right? Or is that not possible with SAGA? If not, we might want to caution that no convergence warning will be raised.
At last, I was using saga because I also wanted to do some feature selection by using l1 penalty which is not supported by lbfgs...
You can use liblinear then.
Best regards, Ben
Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
Ups I did not see the answer of Roman. Sorry about that. It is coming back to the same conclusion :)
On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably.
On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote:
The predictions across solver are exactly the same when I run the code. I am using 0.21.3 version. What is yours?
In [13]: import sklearn
In [14]: sklearn.__version__ Out[14]: '0.21.3'
Serafeim
On 9 Oct 2019, at 21:44, Benoît Presles < benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
(y_pred_lbfgs==y_pred_saga).all() == False
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear n_iter_ = 64. On 08/01/2020 21:18, Guillaume Lemaître wrote:
We issue convergence warning. Can you check n_iter to be sure that you did not convergence to the stated convergence?
On Wed, 8 Jan 2020 at 20:53, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
Dear sklearn users,
I still have some issues concerning logistic regression. I did compare on the same data (simulated data) sklearn with three different solvers (lbfgs, saga, liblinear) and statsmodels.
When everything goes well, I get the same results between lbfgs, saga, liblinear and statsmodels. When everything goes wrong, all the results are different.
In fact, when everything goes wrong, statsmodels gives me a convergence warning (Warning: Maximum number of iterations has been exceeded. Current function value: inf Iterations: 20000) + an error (numpy.linalg.LinAlgError: Singular matrix).
Why sklearn does not tell me anything? How can I know that I have convergence issues with sklearn?
Thanks for your help, Best regards, Ben
--------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression import statsmodels.api as sm # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=20, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='lbfgs', penalty='none', tol=1e-6) classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) print(classifier_lbfgs.intercept_) print(classifier_lbfgs.coef_) # classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='saga', penalty='none', tol=1e-6) classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) print(classifier_saga.intercept_) print(classifier_saga.coef_) # classifier_liblinear = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='liblinear', penalty='l2', tol=1e-6) classifier_liblinear.fit(X_split_train, y_split_train) print('classifier liblinear iter:', classifier_liblinear.n_iter_) print(classifier_liblinear.intercept_) print(classifier_liblinear.coef_) # statsmodels logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train)) logit_res = logit.fit(maxiter=20000) print("Coef statsmodels") print(logit_res.params)
On 11/10/2019 15:42, Andreas Mueller wrote:
On 10/10/19 1:14 PM, Benoît Presles wrote:
Thanks for your answers.
On my real data, I do not have so many samples. I have a bit more than 200 samples in total and I also would like to get some results with unpenalized logisitic regression. What do you suggest? Should I switch to the lbfgs solver?
Yes.
Am I sure that with this solver I will not have any convergence issue and always get the good result? Indeed, I did not get any convergence warning with saga, so I thought everything was fine. I noticed some issues only when I decided to test several solvers. Without comparing the results across solvers, how to be sure that the optimisation goes well? Shouldn't scikit-learn warn the user somehow if it is not the case?
We should attempt to warn in the SAGA solver if it doesn't converge. That it doesn't raise a convergence warning should probably be considered a bug. It uses the maximum weight change as a stopping criterion right now. We could probably compute the dual objective once in the end to see if we converged, right? Or is that not possible with SAGA? If not, we might want to caution that no convergence warning will be raised.
At last, I was using saga because I also wanted to do some feature selection by using l1 penalty which is not supported by lbfgs...
You can use liblinear then.
Best regards, Ben
Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
Ups I did not see the answer of Roman. Sorry about that. It is coming back to the same conclusion :)
On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably.
On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com <mailto:rth.yurchak@gmail.com>> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote: > The predictions across solver are exactly the same when I run the code. > I am using 0.21.3 version. What is yours? > > > In [13]: import sklearn > > In [14]: sklearn.__version__ > Out[14]: '0.21.3' > > > Serafeim > > > >> On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr> >> <mailto:benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>>> wrote: >> >> (y_pred_lbfgs==y_pred_saga).all() == False > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Ben. Liblinear and l-bfgs might both converge but to different solutions, given that the intercept is penalized. There is also problems with ill-conditioned problems that are hard to detect. My impression of SAGA was that the convergence checks are too loose and we should improve them. Have you checked the objective of the l-bfgs and liblinear solvers? With ill-conditioned data the objectives could be similar with different solutions. It's not intended for scikit-learn to warn about ill-conditioned problems, I think, only convergence issues. Hth, Andy On 1/8/20 3:31 PM, Benoît Presles wrote:
With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear n_iter_ = 64.
On 08/01/2020 21:18, Guillaume Lemaître wrote:
We issue convergence warning. Can you check n_iter to be sure that you did not convergence to the stated convergence?
On Wed, 8 Jan 2020 at 20:53, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
Dear sklearn users,
I still have some issues concerning logistic regression. I did compare on the same data (simulated data) sklearn with three different solvers (lbfgs, saga, liblinear) and statsmodels.
When everything goes well, I get the same results between lbfgs, saga, liblinear and statsmodels. When everything goes wrong, all the results are different.
In fact, when everything goes wrong, statsmodels gives me a convergence warning (Warning: Maximum number of iterations has been exceeded. Current function value: inf Iterations: 20000) + an error (numpy.linalg.LinAlgError: Singular matrix).
Why sklearn does not tell me anything? How can I know that I have convergence issues with sklearn?
Thanks for your help, Best regards, Ben
--------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression import statsmodels.api as sm # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=20, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='lbfgs', penalty='none', tol=1e-6) classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) print(classifier_lbfgs.intercept_) print(classifier_lbfgs.coef_) # classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='saga', penalty='none', tol=1e-6) classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) print(classifier_saga.intercept_) print(classifier_saga.coef_) # classifier_liblinear = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='liblinear', penalty='l2', tol=1e-6) classifier_liblinear.fit(X_split_train, y_split_train) print('classifier liblinear iter:', classifier_liblinear.n_iter_) print(classifier_liblinear.intercept_) print(classifier_liblinear.coef_) # statsmodels logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train)) logit_res = logit.fit(maxiter=20000) print("Coef statsmodels") print(logit_res.params)
On 11/10/2019 15:42, Andreas Mueller wrote:
On 10/10/19 1:14 PM, Benoît Presles wrote:
Thanks for your answers.
On my real data, I do not have so many samples. I have a bit more than 200 samples in total and I also would like to get some results with unpenalized logisitic regression. What do you suggest? Should I switch to the lbfgs solver?
Yes.
Am I sure that with this solver I will not have any convergence issue and always get the good result? Indeed, I did not get any convergence warning with saga, so I thought everything was fine. I noticed some issues only when I decided to test several solvers. Without comparing the results across solvers, how to be sure that the optimisation goes well? Shouldn't scikit-learn warn the user somehow if it is not the case?
We should attempt to warn in the SAGA solver if it doesn't converge. That it doesn't raise a convergence warning should probably be considered a bug. It uses the maximum weight change as a stopping criterion right now. We could probably compute the dual objective once in the end to see if we converged, right? Or is that not possible with SAGA? If not, we might want to caution that no convergence warning will be raised.
At last, I was using saga because I also wanted to do some feature selection by using l1 penalty which is not supported by lbfgs...
You can use liblinear then.
Best regards, Ben
Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
Ups I did not see the answer of Roman. Sorry about that. It is coming back to the same conclusion :)
On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably.
On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com <mailto:rth.yurchak@gmail.com>> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote: > The predictions across solver are exactly the same when I run the code. > I am using 0.21.3 version. What is yours? > > > In [13]: import sklearn > > In [14]: sklearn.__version__ > Out[14]: '0.21.3' > > > Serafeim > > > >> On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr> >> <mailto:benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>>> wrote: >> >> (y_pred_lbfgs==y_pred_saga).all() == False > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Andy, As you can notice in the code, I fixed C=1e9, so the intercept with liblinear is not penalised and therefore I get the same solutions with these solvers when everything goes well. How can I check the objective of the l-bfgs and liblinear solvers with sklearn? Best regards, Ben On 08/01/2020 21:53, Andreas Mueller wrote:
Hi Ben.
Liblinear and l-bfgs might both converge but to different solutions, given that the intercept is penalized. There is also problems with ill-conditioned problems that are hard to detect. My impression of SAGA was that the convergence checks are too loose and we should improve them. Have you checked the objective of the l-bfgs and liblinear solvers? With ill-conditioned data the objectives could be similar with different solutions.
It's not intended for scikit-learn to warn about ill-conditioned problems, I think, only convergence issues.
Hth, Andy
On 1/8/20 3:31 PM, Benoît Presles wrote:
With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear n_iter_ = 64.
On 08/01/2020 21:18, Guillaume Lemaître wrote:
We issue convergence warning. Can you check n_iter to be sure that you did not convergence to the stated convergence?
On Wed, 8 Jan 2020 at 20:53, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>> wrote:
Dear sklearn users,
I still have some issues concerning logistic regression. I did compare on the same data (simulated data) sklearn with three different solvers (lbfgs, saga, liblinear) and statsmodels.
When everything goes well, I get the same results between lbfgs, saga, liblinear and statsmodels. When everything goes wrong, all the results are different.
In fact, when everything goes wrong, statsmodels gives me a convergence warning (Warning: Maximum number of iterations has been exceeded. Current function value: inf Iterations: 20000) + an error (numpy.linalg.LinAlgError: Singular matrix).
Why sklearn does not tell me anything? How can I know that I have convergence issues with sklearn?
Thanks for your help, Best regards, Ben
--------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification from sklearn.model_selection import StratifiedShuffleSplit from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression import statsmodels.api as sm # RANDOM_SEED = 2 # X_sim, y_sim = make_classification(n_samples=200, n_features=20, n_informative=10, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=RANDOM_SEED, shuffle=False) # sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED) for train_index_split, test_index_split in sss.split(X_sim, y_sim): X_split_train, X_split_test = X_sim[train_index_split], X_sim[test_index_split] y_split_train, y_split_test = y_sim[train_index_split], y_sim[test_index_split] ss = StandardScaler() X_split_train = ss.fit_transform(X_split_train) X_split_test = ss.transform(X_split_test) # classifier_lbfgs = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='lbfgs', penalty='none', tol=1e-6) classifier_lbfgs.fit(X_split_train, y_split_train) print('classifier lbfgs iter:', classifier_lbfgs.n_iter_) print(classifier_lbfgs.intercept_) print(classifier_lbfgs.coef_) # classifier_saga = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='saga', penalty='none', tol=1e-6) classifier_saga.fit(X_split_train, y_split_train) print('classifier saga iter:', classifier_saga.n_iter_) print(classifier_saga.intercept_) print(classifier_saga.coef_) # classifier_liblinear = LogisticRegression(fit_intercept=True, max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9, solver='liblinear', penalty='l2', tol=1e-6) classifier_liblinear.fit(X_split_train, y_split_train) print('classifier liblinear iter:', classifier_liblinear.n_iter_) print(classifier_liblinear.intercept_) print(classifier_liblinear.coef_) # statsmodels logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train)) logit_res = logit.fit(maxiter=20000) print("Coef statsmodels") print(logit_res.params)
On 11/10/2019 15:42, Andreas Mueller wrote:
On 10/10/19 1:14 PM, Benoît Presles wrote:
Thanks for your answers.
On my real data, I do not have so many samples. I have a bit more than 200 samples in total and I also would like to get some results with unpenalized logisitic regression. What do you suggest? Should I switch to the lbfgs solver?
Yes.
Am I sure that with this solver I will not have any convergence issue and always get the good result? Indeed, I did not get any convergence warning with saga, so I thought everything was fine. I noticed some issues only when I decided to test several solvers. Without comparing the results across solvers, how to be sure that the optimisation goes well? Shouldn't scikit-learn warn the user somehow if it is not the case?
We should attempt to warn in the SAGA solver if it doesn't converge. That it doesn't raise a convergence warning should probably be considered a bug. It uses the maximum weight change as a stopping criterion right now. We could probably compute the dual objective once in the end to see if we converged, right? Or is that not possible with SAGA? If not, we might want to caution that no convergence warning will be raised.
At last, I was using saga because I also wanted to do some feature selection by using l1 penalty which is not supported by lbfgs...
You can use liblinear then.
Best regards, Ben
Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
Ups I did not see the answer of Roman. Sorry about that. It is coming back to the same conclusion :)
On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
Uhm actually increasing to 10000 samples solve the convergence issue. SAGA is not designed to work with a so small sample size most probably.
On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote:
I slightly change the bench such that it uses pipeline and plotted the coefficient:
https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
I only see one of the 10 splits where SAGA is not converging, otherwise the coefficients look very close (I don't attach the figure here but they can be plotted using the snippet). So apart from this second split, the other differences seems to be numerical instability.
Where I have some concern is regarding the convergence rate of SAGA but I have no intuition to know if this is normal or not.
On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak@gmail.com <mailto:rth.yurchak@gmail.com>> wrote:
Ben,
I can confirm your results with penalty='none' and C=1e9. In both cases, you are running a mostly unpenalized logisitic regression. Usually that's less numerically stable than with a small regularization, depending on the data collinearity.
Running that same code with - larger penalty ( smaller C values) - or larger number of samples yields for me the same coefficients (up to some tolerance).
You can also see that SAGA convergence is not good by the fact that it needs 196000 epochs/iterations to converge.
Actually, I have often seen convergence issues with SAG on small datasets (in unit tests), not fully sure why.
-- Roman
On 09/10/2019 22:10, serafim loukas wrote: > The predictions across solver are exactly the same when I run the code. > I am using 0.21.3 version. What is yours? > > > In [13]: import sklearn > > In [14]: sklearn.__version__ > Out[14]: '0.21.3' > > > Serafeim > > > >> On 9 Oct 2019, at 21:44, Benoît Presles <benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr> >> <mailto:benoit.presles@u-bourgogne.fr <mailto:benoit.presles@u-bourgogne.fr>>> wrote: >> >> (y_pred_lbfgs==y_pred_saga).all() == False > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (5)
-
Andreas Mueller -
Benoît Presles -
Guillaume Lemaître -
Roman Yurchak -
serafim loukas