[scikit-learn] logistic regression results are not stable between solvers
Guillaume Lemaître
g.lemaitre58 at gmail.com
Wed Oct 9 14:25:11 EDT 2019
Could you generate more samples, set penalty to none, reduce the tolerance and check the coefficients instead of predictions. This is sure to be sure that this is not only a numerical error.
Sent from my phone - sorry to be brief and potential misspell.
Original Message
From: benoit.presles at u-bourgogne.fr
Sent: 8 October 2019 20:27
To: scikit-learn at python.org
Reply to: scikit-learn at python.org
Subject: [scikit-learn] logistic regression results are not stable between solvers
Dear scikit-learn users,
I am using logistic regression to make some predictions. On my own data,
I do not get the same results between solvers. I managed to reproduce
this issue on synthetic data (see the code below).
All solvers seem to converge (n_iter_ < max_iter), so why do I get
different results?
If results between solvers are not stable, which one to choose?
Best regards,
Ben
------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
#
RANDOM_SEED = 2
#
X_sim, y_sim = make_classification(n_samples=200,
n_features=45,
n_informative=10,
n_redundant=0,
n_repeated=0,
n_classes=2,
n_clusters_per_class=1,
random_state=RANDOM_SEED,
shuffle=False)
#
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
random_state=RANDOM_SEED)
for train_index_split, test_index_split in sss.split(X_sim, y_sim):
X_split_train, X_split_test = X_sim[train_index_split],
X_sim[test_index_split]
y_split_train, y_split_test = y_sim[train_index_split],
y_sim[test_index_split]
ss = StandardScaler()
X_split_train = ss.fit_transform(X_split_train)
X_split_test = ss.transform(X_split_test)
#
classifier_lbfgs = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
solver='lbfgs')
classifier_lbfgs.fit(X_split_train, y_split_train)
print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
classifier_saga = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
solver='saga')
classifier_saga.fit(X_split_train, y_split_train)
print('classifier saga iter:', classifier_saga.n_iter_)
#
y_pred_lbfgs = classifier_lbfgs.predict(X_split_test)
y_pred_saga = classifier_saga.predict(X_split_test)
#
if (y_pred_lbfgs==y_pred_saga).all() == False:
print('lbfgs does not give the same results as saga :-( !')
exit()
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list