[scikit-learn] logistic regression results are not stable between solvers
Andreas Mueller
t3kcit at gmail.com
Tue Oct 8 13:51:22 EDT 2019
I'm pretty sure SAGA is not converging. Unless you scale the data, SAGA
is very slow to converge.
On 10/8/19 7:19 PM, Benoît Presles wrote:
> Dear scikit-learn users,
>
> I am using logistic regression to make some predictions. On my own
> data, I do not get the same results between solvers. I managed to
> reproduce this issue on synthetic data (see the code below).
> All solvers seem to converge (n_iter_ < max_iter), so why do I get
> different results?
> If results between solvers are not stable, which one to choose?
>
>
> Best regards,
> Ben
>
> ------------------------------------------
>
> Here is the code I used to generate synthetic data:
>
> from sklearn.datasets import make_classification
> from sklearn.model_selection import StratifiedShuffleSplit
> from sklearn.preprocessing import StandardScaler
> from sklearn.linear_model import LogisticRegression
> #
> RANDOM_SEED = 2
> #
> X_sim, y_sim = make_classification(n_samples=200,
> n_features=45,
> n_informative=10,
> n_redundant=0,
> n_repeated=0,
> n_classes=2,
> n_clusters_per_class=1,
> random_state=RANDOM_SEED,
> shuffle=False)
> #
> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
> random_state=RANDOM_SEED)
> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
> X_split_train, X_split_test = X_sim[train_index_split],
> X_sim[test_index_split]
> y_split_train, y_split_test = y_sim[train_index_split],
> y_sim[test_index_split]
> ss = StandardScaler()
> X_split_train = ss.fit_transform(X_split_train)
> X_split_test = ss.transform(X_split_test)
> #
> classifier_lbfgs = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
> solver='lbfgs')
> classifier_lbfgs.fit(X_split_train, y_split_train)
> print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
> classifier_saga = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
> solver='saga')
> classifier_saga.fit(X_split_train, y_split_train)
> print('classifier saga iter:', classifier_saga.n_iter_)
> #
> y_pred_lbfgs = classifier_lbfgs.predict(X_split_test)
> y_pred_saga = classifier_saga.predict(X_split_test)
> #
> if (y_pred_lbfgs==y_pred_saga).all() == False:
> print('lbfgs does not give the same results as saga :-( !')
> exit()
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list