From adrin.jalali at gmail.com Thu Jan 2 10:40:35 2020
From: adrin.jalali at gmail.com (Adrin)
Date: Thu, 2 Jan 2020 16:40:35 +0100
Subject: [scikitlearn] Using a new random number generator in libsvm and
liblinear
MessageID:
Hi,
liblinear and libsvm use the C `rand()` function which returns number up to
32767 on the windows platform. This PR
proposes the
following fix:
*Fixed a convergence issue in ``libsvm`` and ``liblinear`` on Windows
platforms*
*impacting all related classifiers and regressors. The random number
generator*
*used to randomly select coordinates in the coordinate descent algorithm
was*
*C ``rand()``, that is only able to generate numbers up to ``32767`` on
windows*
*platform. It was replaced with C++11 ``mt19937``, a Mersenne Twister that*
*correctly generates 31bits/63bits random numbers on all platforms. In
addition,*
*the crude "modulo" postprocessor used to get a random number in a bounded*
*interval was replaced by the tweaked Lemire method as suggested by `this
blog*
*post >`*
In order to keep the models consistent across platforms, we'd like to use
the same (new) rng
on all platforms, which means after this change the generated models may be
slightly different
to what they are now. We'd like to hear any concerns on the matter from the
community, here
or on the PR, before merging the fix.
Best,
Adrin.
 next part 
An HTML attachment was scrubbed...
URL:
From ruchika.work at gmail.com Thu Jan 2 10:45:25 2020
From: ruchika.work at gmail.com (Ruchika Nayyar)
Date: Thu, 2 Jan 2020 10:45:25 0500
Subject: [scikitlearn] Using a new random number generator in libsvm
and liblinear
InReplyTo:
References:
MessageID:
OK
On Thu, Jan 2, 2020, 10:42 AM Adrin wrote:
> Hi,
>
> liblinear and libsvm use the C `rand()` function which returns number up to
> 32767 on the windows platform. This PR
> proposes the
> following fix:
>
> *Fixed a convergence issue in ``libsvm`` and ``liblinear`` on Windows
> platforms*
> *impacting all related classifiers and regressors. The random number
> generator*
> *used to randomly select coordinates in the coordinate descent algorithm
> was*
> *C ``rand()``, that is only able to generate numbers up to ``32767`` on
> windows*
> *platform. It was replaced with C++11 ``mt19937``, a Mersenne Twister that*
> *correctly generates 31bits/63bits random numbers on all platforms. In
> addition,*
> *the crude "modulo" postprocessor used to get a random number in a bounded*
> *interval was replaced by the tweaked Lemire method as suggested by `this
> blog*
> *post >`*
>
> In order to keep the models consistent across platforms, we'd like to use
> the same (new) rng
> on all platforms, which means after this change the generated models may
> be slightly different
> to what they are now. We'd like to hear any concerns on the matter from
> the community, here
> or on the PR, before merging the fix.
>
> Best,
> Adrin.
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From olivier.grisel at ensta.org Thu Jan 2 12:57:38 2020
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Thu, 2 Jan 2020 18:57:38 +0100
Subject: [scikitlearn] scikitlearn 0.22.1 is out!
MessageID:
This is a minor release that includes many bug fixes and solves a
number of packaging issues with Windows wheels in particular. Here is
the full changelog:
https://scikitlearn.org/stable/whats_new/v0.22.html#version0221
The conda package will follow soon (hopefully).
Thank you very much to all who contributed to this release!
Cheers and happy new year!

Olivier
http://twitter.com/ogrisel  http://github.com/ogrisel
From alexandre.gramfort at inria.fr Sat Jan 4 07:49:50 2020
From: alexandre.gramfort at inria.fr (Alexandre Gramfort)
Date: Sat, 4 Jan 2020 13:49:50 +0100
Subject: [scikitlearn] Using a new random number generator in libsvm
and liblinear
InReplyTo:
References:
MessageID:
I don't foresee any issue with that.
Alex
From gael.varoquaux at normalesup.org Sat Jan 4 15:22:12 2020
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sat, 4 Jan 2020 15:22:12 0500
Subject: [scikitlearn] Using a new random number generator in libsvm
and liblinear
InReplyTo:
References:
MessageID: <20200104202212.gwqww6axlq7shmjt@phare.normalesup.org>
Me neither.
The only drawback that I see is that we have a codebase that is drifting
more and more from upstream. But I think that that ship has sailed.
G
On Sat, Jan 04, 2020 at 01:49:50PM +0100, Alexandre Gramfort wrote:
> I don't foresee any issue with that.
> Alex
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn

Gael Varoquaux
Research Director, INRIA Visiting professor, McGill
http://gaelvaroquaux.info http://twitter.com/GaelVaroquaux
From marmochiaskl at gmail.com Mon Jan 6 05:13:40 2020
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Mon, 6 Jan 2020 11:13:40 +0100
Subject: [scikitlearn] Issues for Berlin and Paris Sprints
MessageID:
Dear coredevs,
First let me wish a Happy New Year to you all!
There will be two scikitlearn sprints in January to start this 2020 in a
busy way: one in Berlin [1] (Jan 25) and one in Paris [2] (Jan 2831).
I feel like we could benefit of some coordination in selecting the issues
for those two events.
Reshama Shaikh and I, we are already in touch.
I've opened two projects [3][4] to followup the issue selection for the
sprints.
I will check for previous "Sprint" labels in the skl issues and maybe ask
for clarification on some of them... please, be patient.
The goal is to prepare the two sprints in order to make the review process
as efficient as possible: we don't want to waste the reviewer time and we
hope to make the PR experience a learning opportunity on both sides.
In particular, I would like to ask a favour to all of you: I don't know if
this is even always possible, but, IMO, it would be really useful to have a
list of two/three reviewers available to check on a specific issue. I am,
personally, a bit uncomfortable in pinging coredevs randomly, under the
impression of crying wolf lacking for attention... If people in charge are
defined in advance this could, I think, smooth the review process. What do
you think?
Please, let us know if you have any suggestion or recommendation to improve
the Sprint organization.
Thanks for listening,
Best,
Chiara
[1] https://github.com/WiMLDS/berlin2020scikitsprint
[2]
https://github.com/scikitlearn/scikitlearn/wiki/ParisscikitlearnSprintoftheDecade
[3] https://github.com/WiMLDS/berlin2020scikitsprint/projects/1
[4]
https://github.com/scikitlearnfondation/ParisSprintJanuary2020/projects/1
 next part 
An HTML attachment was scrubbed...
URL:
From adrin.jalali at gmail.com Mon Jan 6 10:11:41 2020
From: adrin.jalali at gmail.com (Adrin)
Date: Mon, 6 Jan 2020 16:11:41 +0100
Subject: [scikitlearn] Vote on SLEP010: n_features_in_ attribute
InReplyTo: <18c5d9630b7a45adbd6d0c9146be58b3@Canary>
References: <4600b19ac06a5ed50f14dbf5a0a7cd5b@gmail.com>
<18c5d9630b7a45adbd6d0c9146be58b3@Canary>
MessageID:
According to our governance model, this vote is now closed and accepted,
and the implementation
shall take the concerns mentioned here into account.
Thanks everybody for the attention and the discussion.
On Sat, Dec 21, 2019 at 6:36 PM Thomas J Fan wrote:
> I am +1. I aggree with Joel that we should look into making these methods
> (or maybe functions) usable by external developers.
>
> Thomas
>
> On Monday, Dec 16, 2019 at 4:20 PM, Alexandre Gramfort <
> alexandre.gramfort at inria.fr> wrote:
> +1 on SLEP + adding an estimator tag if it does not apply eg Text
> vectorizers etc.
>
> Alex
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From siddharthgupta234 at gmail.com Tue Jan 7 05:25:01 2020
From: siddharthgupta234 at gmail.com (Siddharth Gupta)
Date: Tue, 7 Jan 2020 15:55:01 +0530
Subject: [scikitlearn] Time for Roadmap for the coming years?
MessageID:
The last roadmap for Scikit learn available on the official website
was posted in 2018. With the
onset of 2020s and Python 2.7 no longer receiving bug fixes or security
support, I wish scikitlearn could come up with a fresh roadmap for the
upcoming years. What are everyone's take and suggestions?
Regards
Siddharth Gupta,
Website
Linkedin  Twitter
 Facebook
 next part 
An HTML attachment was scrubbed...
URL:
From adrin.jalali at gmail.com Tue Jan 7 05:33:25 2020
From: adrin.jalali at gmail.com (Adrin)
Date: Tue, 7 Jan 2020 11:33:25 +0100
Subject: [scikitlearn] Time for Roadmap for the coming years?
InReplyTo:
References:
MessageID:
Hi,
Although that roadmap was written in 2018, we recently updated it and it
still stands.
Other than that, we also have an issue discussing the version 1.0
milestones: https://github.com/scikitlearn/scikitlearn/issues/14386
Thanks,
Adrin.
On Tue, Jan 7, 2020 at 11:26 AM Siddharth Gupta
wrote:
> The last roadmap for Scikit learn available on the official website
> was posted in 2018. With
> the onset of 2020s and Python 2.7 no longer receiving bug fixes or security
> support, I wish scikitlearn could come up with a fresh roadmap for the
> upcoming years. What are everyone's take and suggestions?
>
> Regards
>
> Siddharth Gupta,
> Website
>
> Linkedin  Twitter
>  Facebook
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From niourf at gmail.com Tue Jan 7 05:35:19 2020
From: niourf at gmail.com (Nicolas Hug)
Date: Tue, 7 Jan 2020 05:35:19 0500
Subject: [scikitlearn] Time for Roadmap for the coming years?
InReplyTo:
References:
MessageID: <51c133dfa1c6cb107663a8ab76266dc4@gmail.com>
The roadmap was updated not so long ago
(https://github.com/scikitlearn/scikitlearn/pull/15332)
On a related note, we recently discussed defining a roadmap for an
eventual 1.0 release
https://github.com/scikitlearn/scikitlearn/issues/14386
On 1/7/20 5:25 AM, Siddharth Gupta wrote:
> The last roadmap for Scikit learn available on the official website
> ?was posted in 2018.
> With the onset of 2020s and Python 2.7 no longer receiving bug fixes
> or security support, I wish scikitlearn could come up with a fresh
> roadmap for the upcoming years. What are everyone's take and suggestions?
>
> Regards
>
> Siddharth Gupta,
> Website
>
> Linkedin  Twitter
> ? Facebook
>
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From joel.nothman at gmail.com Tue Jan 7 16:33:37 2020
From: joel.nothman at gmail.com (Joel Nothman)
Date: Wed, 8 Jan 2020 08:33:37 +1100
Subject: [scikitlearn] Time for Roadmap for the coming years?
InReplyTo: <51c133dfa1c6cb107663a8ab76266dc4@gmail.com>
References:
<51c133dfa1c6cb107663a8ab76266dc4@gmail.com>
MessageID:
The roadmap includes a statement of purpose as at 2018. I don't think the
core developers think the roadmap itself is very outdated. But thanks for
the reminder. Joel
 next part 
An HTML attachment was scrubbed...
URL:
From benoit.presles at ubourgogne.fr Wed Jan 8 14:45:59 2020
From: benoit.presles at ubourgogne.fr (=?UTF8?Q?Beno=c3=aet_Presles?=)
Date: Wed, 8 Jan 2020 20:45:59 +0100
Subject: [scikitlearn] logistic regression results are not stable
between solvers
InReplyTo:
References:
<5591ab4c6a152910c5920c019b1a6600@ubourgogne.fr>
<44B72247308C42A4B4E1DFD1BDFC5058@hotmail.com>
<586c60249bef3ab8513d547913808039@gmail.com>
<4d4dc37ded57b512fcdf45693ff9e489@ubourgogne.fr>
MessageID: <9c18b18c37992da6ec05b9144aa2557a@ubourgogne.fr>
Dear sklearn users,
I still have some issues concerning logistic regression.
I did compare on the same data (simulated data) sklearn with three
different solvers (lbfgs, saga, liblinear) and statsmodels.
When everything goes well, I get the same results between lbfgs, saga,
liblinear and statsmodels. When everything goes wrong, all the results
are different.
In fact, when everything goes wrong, statsmodels gives me a convergence
warning (Warning: Maximum number of iterations has been exceeded.
Current function value: inf Iterations: 20000) + an error
(numpy.linalg.LinAlgError: Singular matrix).
Why sklearn does not tell me anything? How can I know that I have
convergence issues with sklearn?
Thanks for your help,
Best regards,
Ben

Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
#
RANDOM_SEED = 2
#
X_sim, y_sim = make_classification(n_samples=200,
?????????????????????????? n_features=20,
?????????????????????????? n_informative=10,
?????????????????????????? n_redundant=0,
?????????????????????????? n_repeated=0,
?????????????????????????? n_classes=2,
?????????????????????????? n_clusters_per_class=1,
?????????????????????????? random_state=RANDOM_SEED,
?????????????????????????? shuffle=False)
#
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
random_state=RANDOM_SEED)
for train_index_split, test_index_split in sss.split(X_sim, y_sim):
??? X_split_train, X_split_test = X_sim[train_index_split],
X_sim[test_index_split]
??? y_split_train, y_split_test = y_sim[train_index_split],
y_sim[test_index_split]
??? ss = StandardScaler()
??? X_split_train = ss.fit_transform(X_split_train)
??? X_split_test = ss.transform(X_split_test)
??? #
??? classifier_lbfgs = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
??????????????????????????????????? solver='lbfgs', penalty='none',
tol=1e6)
??? classifier_lbfgs.fit(X_split_train, y_split_train)
??? print('classifier lbfgs iter:',? classifier_lbfgs.n_iter_)
??? print(classifier_lbfgs.intercept_)
??? print(classifier_lbfgs.coef_)
??? #
??? classifier_saga = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
??????????????????????????????????? solver='saga', penalty='none',
tol=1e6)
??? classifier_saga.fit(X_split_train, y_split_train)
??? print('classifier saga iter:', classifier_saga.n_iter_)
??? print(classifier_saga.intercept_)
??? print(classifier_saga.coef_)
??? #
??? classifier_liblinear = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
???????????????????????????????????????? C=1e9,
???????????????????????????????????????? solver='liblinear',
penalty='l2', tol=1e6)
??? classifier_liblinear.fit(X_split_train, y_split_train)
??? print('classifier liblinear iter:', classifier_liblinear.n_iter_)
??? print(classifier_liblinear.intercept_)
??? print(classifier_liblinear.coef_)
??? # statsmodels
??? logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train))
??? logit_res = logit.fit(maxiter=20000)
??? print("Coef statsmodels")
??? print(logit_res.params)
On 11/10/2019 15:42, Andreas Mueller wrote:
>
>
> On 10/10/19 1:14 PM, Beno?t Presles wrote:
>>
>> Thanks for your answers.
>>
>> On my real data, I do not have so many samples. I have a bit more
>> than 200 samples in total and I also would like to get some results
>> with unpenalized logisitic regression.
>> What do you suggest? Should I switch to the lbfgs solver?
> Yes.
>> Am I sure that with this solver I will not have any convergence issue
>> and always get the good result? Indeed, I did not get any convergence
>> warning with saga, so I thought everything was fine. I noticed some
>> issues only when I decided to test several solvers. Without comparing
>> the results across solvers, how to be sure that the optimisation goes
>> well? Shouldn't scikitlearn warn the user somehow if it is not the case?
> We should attempt to warn in the SAGA solver if it doesn't converge.
> That it doesn't raise a convergence warning should probably be
> considered a bug.
> It uses the maximum weight change as a stopping criterion right now.
> We could probably compute the dual objective once in the end to see if
> we converged, right? Or is that not possible with SAGA? If not, we
> might want to caution that no convergence warning will be raised.
>
>>
>> At last, I was using saga because I also wanted to do some feature
>> selection by using l1 penalty which is not supported by lbfgs...
> You can use liblinear then.
>
>
>>
>> Best regards,
>> Ben
>>
>>
>> Le 09/10/2019 ? 23:39, Guillaume Lema?tre a ?crit?:
>>> Ups I did not see the answer of Roman. Sorry about that. It is
>>> coming back to the same conclusion :)
>>>
>>> On Wed, 9 Oct 2019 at 23:37, Guillaume Lema?tre
>>> > wrote:
>>>
>>> Uhm actually increasing to 10000 samples solve the convergence
>>> issue.
>>> SAGA is not designed to work with a so small sample size most
>>> probably.
>>>
>>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lema?tre
>>> > wrote:
>>>
>>> I slightly change the bench such that it uses pipeline and
>>> plotted the coefficient:
>>>
>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>
>>> I only see one of the 10 splits where SAGA is not
>>> converging, otherwise the coefficients
>>> look very close (I don't attach the figure here but they can
>>> be plotted using the snippet).
>>> So apart from this second split, the other differences seems
>>> to be numerical instability.
>>>
>>> Where I have some concern is regarding the convergence rate
>>> of SAGA but I have no
>>> intuition to know if this is normal or not.
>>>
>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>> > wrote:
>>>
>>> Ben,
>>>
>>> I can confirm your results with penalty='none' and
>>> C=1e9. In both cases,
>>> you are running a mostly unpenalized logisitic
>>> regression. Usually
>>> that's less numerically stable than with a small
>>> regularization,
>>> depending on the data collinearity.
>>>
>>> Running that same code with
>>> ?  larger penalty ( smaller C values)
>>> ?  or larger number of samples
>>> ? yields for me the same coefficients (up to some
>>> tolerance).
>>>
>>> You can also see that SAGA convergence is not good by
>>> the fact that it
>>> needs 196000 epochs/iterations to converge.
>>>
>>> Actually, I have often seen convergence issues with SAG
>>> on small
>>> datasets (in unit tests), not fully sure why.
>>>
>>> 
>>> Roman
>>>
>>> On 09/10/2019 22:10, serafim loukas wrote:
>>> > The predictions across solver are exactly the same
>>> when I run the code.
>>> > I am using 0.21.3 version. What is yours?
>>> >
>>> >
>>> > In [13]: import sklearn
>>> >
>>> > In [14]: sklearn.__version__
>>> > Out[14]: '0.21.3'
>>> >
>>> >
>>> > Serafeim
>>> >
>>> >
>>> >
>>> >> On 9 Oct 2019, at 21:44, Beno?t Presles
>>> >>
>>> >> >> >> wrote:
>>> >>
>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>> >
>>> >
>>> > _______________________________________________
>>> > scikitlearn mailing list
>>> > scikitlearn at python.org
>>> > https://mail.python.org/mailman/listinfo/scikitlearn
>>> >
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>>>
>>>
>>> 
>>> Guillaume Lemaitre
>>> Scikitlearn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>>
>>>
>>> 
>>> Guillaume Lemaitre
>>> Scikitlearn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>>
>>>
>>> 
>>> Guillaume Lemaitre
>>> Scikitlearn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From g.lemaitre58 at gmail.com Wed Jan 8 15:18:27 2020
From: g.lemaitre58 at gmail.com (=?UTF8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Wed, 8 Jan 2020 21:18:27 +0100
Subject: [scikitlearn] logistic regression results are not stable
between solvers
InReplyTo: <9c18b18c37992da6ec05b9144aa2557a@ubourgogne.fr>
References:
<5591ab4c6a152910c5920c019b1a6600@ubourgogne.fr>
<44B72247308C42A4B4E1DFD1BDFC5058@hotmail.com>
<586c60249bef3ab8513d547913808039@gmail.com>
<4d4dc37ded57b512fcdf45693ff9e489@ubourgogne.fr>
<9c18b18c37992da6ec05b9144aa2557a@ubourgogne.fr>
MessageID:
We issue convergence warning. Can you check n_iter to be sure that you did
not convergence to the stated convergence?
On Wed, 8 Jan 2020 at 20:53, Beno?t Presles
wrote:
> Dear sklearn users,
>
> I still have some issues concerning logistic regression.
> I did compare on the same data (simulated data) sklearn with three
> different solvers (lbfgs, saga, liblinear) and statsmodels.
>
> When everything goes well, I get the same results between lbfgs, saga,
> liblinear and statsmodels. When everything goes wrong, all the results are
> different.
>
> In fact, when everything goes wrong, statsmodels gives me a convergence
> warning (Warning: Maximum number of iterations has been exceeded. Current
> function value: inf Iterations: 20000) + an error
> (numpy.linalg.LinAlgError: Singular matrix).
>
> Why sklearn does not tell me anything? How can I know that I have
> convergence issues with sklearn?
>
>
> Thanks for your help,
> Best regards,
> Ben
>
> 
>
> Here is the code I used to generate synthetic data:
>
> from sklearn.datasets import make_classification
> from sklearn.model_selection import StratifiedShuffleSplit
> from sklearn.preprocessing import StandardScaler
> from sklearn.linear_model import LogisticRegression
> import statsmodels.api as sm
> #
> RANDOM_SEED = 2
> #
> X_sim, y_sim = make_classification(n_samples=200,
> n_features=20,
> n_informative=10,
> n_redundant=0,
> n_repeated=0,
> n_classes=2,
> n_clusters_per_class=1,
> random_state=RANDOM_SEED,
> shuffle=False)
> #
> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
> random_state=RANDOM_SEED)
> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
> X_split_train, X_split_test = X_sim[train_index_split],
> X_sim[test_index_split]
> y_split_train, y_split_test = y_sim[train_index_split],
> y_sim[test_index_split]
> ss = StandardScaler()
> X_split_train = ss.fit_transform(X_split_train)
> X_split_test = ss.transform(X_split_test)
> #
> classifier_lbfgs = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
> solver='lbfgs', penalty='none',
> tol=1e6)
> classifier_lbfgs.fit(X_split_train, y_split_train)
> print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
> print(classifier_lbfgs.intercept_)
> print(classifier_lbfgs.coef_)
> #
> classifier_saga = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
> solver='saga', penalty='none',
> tol=1e6)
> classifier_saga.fit(X_split_train, y_split_train)
> print('classifier saga iter:', classifier_saga.n_iter_)
> print(classifier_saga.intercept_)
> print(classifier_saga.coef_)
> #
> classifier_liblinear = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
> C=1e9,
> solver='liblinear', penalty='l2',
> tol=1e6)
> classifier_liblinear.fit(X_split_train, y_split_train)
> print('classifier liblinear iter:', classifier_liblinear.n_iter_)
> print(classifier_liblinear.intercept_)
> print(classifier_liblinear.coef_)
> # statsmodels
> logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train))
> logit_res = logit.fit(maxiter=20000)
> print("Coef statsmodels")
> print(logit_res.params)
>
>
>
> On 11/10/2019 15:42, Andreas Mueller wrote:
>
>
>
> On 10/10/19 1:14 PM, Beno?t Presles wrote:
>
> Thanks for your answers.
> On my real data, I do not have so many samples. I have a bit more than 200
> samples in total and I also would like to get some results with unpenalized
> logisitic regression.
> What do you suggest? Should I switch to the lbfgs solver?
>
> Yes.
>
> Am I sure that with this solver I will not have any convergence issue and
> always get the good result? Indeed, I did not get any convergence warning
> with saga, so I thought everything was fine. I noticed some issues only
> when I decided to test several solvers. Without comparing the results
> across solvers, how to be sure that the optimisation goes well? Shouldn't
> scikitlearn warn the user somehow if it is not the case?
>
> We should attempt to warn in the SAGA solver if it doesn't converge. That
> it doesn't raise a convergence warning should probably be considered a bug.
> It uses the maximum weight change as a stopping criterion right now.
> We could probably compute the dual objective once in the end to see if we
> converged, right? Or is that not possible with SAGA? If not, we might want
> to caution that no convergence warning will be raised.
>
>
> At last, I was using saga because I also wanted to do some feature
> selection by using l1 penalty which is not supported by lbfgs...
>
> You can use liblinear then.
>
>
>
> Best regards,
> Ben
>
>
> Le 09/10/2019 ? 23:39, Guillaume Lema?tre a ?crit :
>
> Ups I did not see the answer of Roman. Sorry about that. It is coming back
> to the same conclusion :)
>
> On Wed, 9 Oct 2019 at 23:37, Guillaume Lema?tre
> wrote:
>
>> Uhm actually increasing to 10000 samples solve the convergence issue.
>> SAGA is not designed to work with a so small sample size most probably.
>>
>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lema?tre
>> wrote:
>>
>>> I slightly change the bench such that it uses pipeline and plotted the
>>> coefficient:
>>>
>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>
>>> I only see one of the 10 splits where SAGA is not converging, otherwise
>>> the coefficients
>>> look very close (I don't attach the figure here but they can be plotted
>>> using the snippet).
>>> So apart from this second split, the other differences seems to be
>>> numerical instability.
>>>
>>> Where I have some concern is regarding the convergence rate of SAGA but
>>> I have no
>>> intuition to know if this is normal or not.
>>>
>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>> wrote:
>>>
>>>> Ben,
>>>>
>>>> I can confirm your results with penalty='none' and C=1e9. In both
>>>> cases,
>>>> you are running a mostly unpenalized logisitic regression. Usually
>>>> that's less numerically stable than with a small regularization,
>>>> depending on the data collinearity.
>>>>
>>>> Running that same code with
>>>>  larger penalty ( smaller C values)
>>>>  or larger number of samples
>>>> yields for me the same coefficients (up to some tolerance).
>>>>
>>>> You can also see that SAGA convergence is not good by the fact that it
>>>> needs 196000 epochs/iterations to converge.
>>>>
>>>> Actually, I have often seen convergence issues with SAG on small
>>>> datasets (in unit tests), not fully sure why.
>>>>
>>>> 
>>>> Roman
>>>>
>>>> On 09/10/2019 22:10, serafim loukas wrote:
>>>> > The predictions across solver are exactly the same when I run the
>>>> code.
>>>> > I am using 0.21.3 version. What is yours?
>>>> >
>>>> >
>>>> > In [13]: import sklearn
>>>> >
>>>> > In [14]: sklearn.__version__
>>>> > Out[14]: '0.21.3'
>>>> >
>>>> >
>>>> > Serafeim
>>>> >
>>>> >
>>>> >
>>>> >> On 9 Oct 2019, at 21:44, Beno?t Presles <
>>>> benoit.presles at ubourgogne.fr
>>>> >> > wrote:
>>>> >>
>>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > scikitlearn mailing list
>>>> > scikitlearn at python.org
>>>> > https://mail.python.org/mailman/listinfo/scikitlearn
>>>> >
>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>>
>>>
>>> 
>>> Guillaume Lemaitre
>>> Scikitlearn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>
>>
>> 
>> Guillaume Lemaitre
>> Scikitlearn @ Inria Foundation
>> https://glemaitre.github.io/
>>
>
>
> 
> Guillaume Lemaitre
> Scikitlearn @ Inria Foundation
> https://glemaitre.github.io/
>
> _______________________________________________
> scikitlearn mailing listscikitlearn at python.orghttps://mail.python.org/mailman/listinfo/scikitlearn
>
>
> _______________________________________________
> scikitlearn mailing listscikitlearn at python.orghttps://mail.python.org/mailman/listinfo/scikitlearn
>
>
>
> _______________________________________________
> scikitlearn mailing listscikitlearn at python.orghttps://mail.python.org/mailman/listinfo/scikitlearn
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>

Guillaume Lemaitre
Scikitlearn @ Inria Foundation
https://glemaitre.github.io/
 next part 
An HTML attachment was scrubbed...
URL:
From benoit.presles at ubourgogne.fr Wed Jan 8 15:31:47 2020
From: benoit.presles at ubourgogne.fr (=?UTF8?Q?Beno=c3=aet_Presles?=)
Date: Wed, 8 Jan 2020 21:31:47 +0100
Subject: [scikitlearn] logistic regression results are not stable
between solvers
InReplyTo:
References:
<5591ab4c6a152910c5920c019b1a6600@ubourgogne.fr>
<44B72247308C42A4B4E1DFD1BDFC5058@hotmail.com>
<586c60249bef3ab8513d547913808039@gmail.com>
<4d4dc37ded57b512fcdf45693ff9e489@ubourgogne.fr>
<9c18b18c37992da6ec05b9144aa2557a@ubourgogne.fr>
MessageID:
With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear
n_iter_ = 64.
On 08/01/2020 21:18, Guillaume Lema?tre wrote:
> We issue convergence warning. Can you check n_iter to be sure that you
> did not convergence to the stated convergence?
>
> On Wed, 8 Jan 2020 at 20:53, Beno?t Presles
> >
> wrote:
>
> Dear sklearn users,
>
> I still have some issues concerning logistic regression.
> I did compare on the same data (simulated data) sklearn with three
> different solvers (lbfgs, saga, liblinear) and statsmodels.
>
> When everything goes well, I get the same results between lbfgs,
> saga, liblinear and statsmodels. When everything goes wrong, all
> the results are different.
>
> In fact, when everything goes wrong, statsmodels gives me a
> convergence warning (Warning: Maximum number of iterations has
> been exceeded. Current function value: inf Iterations: 20000) + an
> error (numpy.linalg.LinAlgError: Singular matrix).
>
> Why sklearn does not tell me anything? How can I know that I have
> convergence issues with sklearn?
>
>
> Thanks for your help,
> Best regards,
> Ben
>
> 
>
> Here is the code I used to generate synthetic data:
>
> from sklearn.datasets import make_classification
> from sklearn.model_selection import StratifiedShuffleSplit
> from sklearn.preprocessing import StandardScaler
> from sklearn.linear_model import LogisticRegression
> import statsmodels.api as sm
> #
> RANDOM_SEED = 2
> #
> X_sim, y_sim = make_classification(n_samples=200,
> ?????????????????????????? n_features=20,
> ?????????????????????????? n_informative=10,
> ?????????????????????????? n_redundant=0,
> ?????????????????????????? n_repeated=0,
> ?????????????????????????? n_classes=2,
> ?????????????????????????? n_clusters_per_class=1,
> ?????????????????????????? random_state=RANDOM_SEED,
> ?????????????????????????? shuffle=False)
> #
> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
> random_state=RANDOM_SEED)
> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
> ??? X_split_train, X_split_test = X_sim[train_index_split],
> X_sim[test_index_split]
> ??? y_split_train, y_split_test = y_sim[train_index_split],
> y_sim[test_index_split]
> ??? ss = StandardScaler()
> ??? X_split_train = ss.fit_transform(X_split_train)
> ??? X_split_test = ss.transform(X_split_test)
> ??? #
> ??? classifier_lbfgs = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
> ??????????????????????????????????? solver='lbfgs',
> penalty='none', tol=1e6)
> ??? classifier_lbfgs.fit(X_split_train, y_split_train)
> ??? print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
> ??? print(classifier_lbfgs.intercept_)
> ??? print(classifier_lbfgs.coef_)
> ??? #
> ??? classifier_saga = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
> ??????????????????????????????????? solver='saga', penalty='none',
> tol=1e6)
> ??? classifier_saga.fit(X_split_train, y_split_train)
> ??? print('classifier saga iter:', classifier_saga.n_iter_)
> ??? print(classifier_saga.intercept_)
> ??? print(classifier_saga.coef_)
> ??? #
> ??? classifier_liblinear = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
> ???????????????????????????????????????? C=1e9,
> solver='liblinear', penalty='l2', tol=1e6)
> ??? classifier_liblinear.fit(X_split_train, y_split_train)
> ??? print('classifier liblinear iter:', classifier_liblinear.n_iter_)
> ??? print(classifier_liblinear.intercept_)
> ??? print(classifier_liblinear.coef_)
> ??? # statsmodels
> ??? logit = sm.Logit(y_split_train,
> sm.tools.add_constant(X_split_train))
> ??? logit_res = logit.fit(maxiter=20000)
> ??? print("Coef statsmodels")
> ??? print(logit_res.params)
>
>
>
> On 11/10/2019 15:42, Andreas Mueller wrote:
>>
>>
>> On 10/10/19 1:14 PM, Beno?t Presles wrote:
>>>
>>> Thanks for your answers.
>>>
>>> On my real data, I do not have so many samples. I have a bit
>>> more than 200 samples in total and I also would like to get some
>>> results with unpenalized logisitic regression.
>>> What do you suggest? Should I switch to the lbfgs solver?
>> Yes.
>>> Am I sure that with this solver I will not have any convergence
>>> issue and always get the good result? Indeed, I did not get any
>>> convergence warning with saga, so I thought everything was fine.
>>> I noticed some issues only when I decided to test several
>>> solvers. Without comparing the results across solvers, how to be
>>> sure that the optimisation goes well? Shouldn't scikitlearn
>>> warn the user somehow if it is not the case?
>> We should attempt to warn in the SAGA solver if it doesn't
>> converge. That it doesn't raise a convergence warning should
>> probably be considered a bug.
>> It uses the maximum weight change as a stopping criterion right now.
>> We could probably compute the dual objective once in the end to
>> see if we converged, right? Or is that not possible with SAGA? If
>> not, we might want to caution that no convergence warning will be
>> raised.
>>
>>>
>>> At last, I was using saga because I also wanted to do some
>>> feature selection by using l1 penalty which is not supported by
>>> lbfgs...
>> You can use liblinear then.
>>
>>
>>>
>>> Best regards,
>>> Ben
>>>
>>>
>>> Le 09/10/2019 ? 23:39, Guillaume Lema?tre a ?crit?:
>>>> Ups I did not see the answer of Roman. Sorry about that. It is
>>>> coming back to the same conclusion :)
>>>>
>>>> On Wed, 9 Oct 2019 at 23:37, Guillaume Lema?tre
>>>> > wrote:
>>>>
>>>> Uhm actually increasing to 10000 samples solve the
>>>> convergence issue.
>>>> SAGA is not designed to work with a so small sample size
>>>> most probably.
>>>>
>>>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lema?tre
>>>> > wrote:
>>>>
>>>> I slightly change the bench such that it uses pipeline
>>>> and plotted the coefficient:
>>>>
>>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>>
>>>> I only see one of the 10 splits where SAGA is not
>>>> converging, otherwise the coefficients
>>>> look very close (I don't attach the figure here but
>>>> they can be plotted using the snippet).
>>>> So apart from this second split, the other differences
>>>> seems to be numerical instability.
>>>>
>>>> Where I have some concern is regarding the convergence
>>>> rate of SAGA but I have no
>>>> intuition to know if this is normal or not.
>>>>
>>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>> >
>>>> wrote:
>>>>
>>>> Ben,
>>>>
>>>> I can confirm your results with penalty='none' and
>>>> C=1e9. In both cases,
>>>> you are running a mostly unpenalized logisitic
>>>> regression. Usually
>>>> that's less numerically stable than with a small
>>>> regularization,
>>>> depending on the data collinearity.
>>>>
>>>> Running that same code with
>>>> ?  larger penalty ( smaller C values)
>>>> ?  or larger number of samples
>>>> ? yields for me the same coefficients (up to some
>>>> tolerance).
>>>>
>>>> You can also see that SAGA convergence is not good
>>>> by the fact that it
>>>> needs 196000 epochs/iterations to converge.
>>>>
>>>> Actually, I have often seen convergence issues with
>>>> SAG on small
>>>> datasets (in unit tests), not fully sure why.
>>>>
>>>> 
>>>> Roman
>>>>
>>>> On 09/10/2019 22:10, serafim loukas wrote:
>>>> > The predictions across solver are exactly the
>>>> same when I run the code.
>>>> > I am using 0.21.3 version. What is yours?
>>>> >
>>>> >
>>>> > In [13]: import sklearn
>>>> >
>>>> > In [14]: sklearn.__version__
>>>> > Out[14]: '0.21.3'
>>>> >
>>>> >
>>>> > Serafeim
>>>> >
>>>> >
>>>> >
>>>> >> On 9 Oct 2019, at 21:44, Beno?t Presles
>>>> >>>
>>>> >> >>> >> wrote:
>>>> >>
>>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > scikitlearn mailing list
>>>> > scikitlearn at python.org
>>>>
>>>> > https://mail.python.org/mailman/listinfo/scikitlearn
>>>> >
>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>>
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>>>
>>>>
>>>> 
>>>> Guillaume Lemaitre
>>>> Scikitlearn @ Inria Foundation
>>>> https://glemaitre.github.io/
>>>>
>>>>
>>>>
>>>> 
>>>> Guillaume Lemaitre
>>>> Scikitlearn @ Inria Foundation
>>>> https://glemaitre.github.io/
>>>>
>>>>
>>>>
>>>> 
>>>> Guillaume Lemaitre
>>>> Scikitlearn @ Inria Foundation
>>>> https://glemaitre.github.io/
>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
>
>
> 
> Guillaume Lemaitre
> Scikitlearn @ Inria Foundation
> https://glemaitre.github.io/
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From t3kcit at gmail.com Wed Jan 8 15:53:47 2020
From: t3kcit at gmail.com (Andreas Mueller)
Date: Wed, 8 Jan 2020 15:53:47 0500
Subject: [scikitlearn] logistic regression results are not stable
between solvers
InReplyTo:
References:
<5591ab4c6a152910c5920c019b1a6600@ubourgogne.fr>
<44B72247308C42A4B4E1DFD1BDFC5058@hotmail.com>
<586c60249bef3ab8513d547913808039@gmail.com>
<4d4dc37ded57b512fcdf45693ff9e489@ubourgogne.fr>
<9c18b18c37992da6ec05b9144aa2557a@ubourgogne.fr>
MessageID: <63a991901a1e059dba16a8fcc25a1dc5@gmail.com>
Hi Ben.
Liblinear and lbfgs might both converge but to different solutions,
given that the intercept is penalized.
There is also problems with illconditioned problems that are hard to
detect.
My impression of SAGA was that the convergence checks are too loose and
we should improve them.
Have you checked the objective of the lbfgs and liblinear solvers? With
illconditioned data the objectives could be similar with different
solutions.
It's not intended for scikitlearn to warn about illconditioned
problems, I think, only convergence issues.
Hth,
Andy
On 1/8/20 3:31 PM, Beno?t Presles wrote:
> With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear
> n_iter_ = 64.
>
>
> On 08/01/2020 21:18, Guillaume Lema?tre wrote:
>> We issue convergence warning. Can you check n_iter to be sure that
>> you did not convergence to the stated convergence?
>>
>> On Wed, 8 Jan 2020 at 20:53, Beno?t Presles
>> > > wrote:
>>
>> Dear sklearn users,
>>
>> I still have some issues concerning logistic regression.
>> I did compare on the same data (simulated data) sklearn with
>> three different solvers (lbfgs, saga, liblinear) and statsmodels.
>>
>> When everything goes well, I get the same results between lbfgs,
>> saga, liblinear and statsmodels. When everything goes wrong, all
>> the results are different.
>>
>> In fact, when everything goes wrong, statsmodels gives me a
>> convergence warning (Warning: Maximum number of iterations has
>> been exceeded. Current function value: inf Iterations: 20000) +
>> an error (numpy.linalg.LinAlgError: Singular matrix).
>>
>> Why sklearn does not tell me anything? How can I know that I have
>> convergence issues with sklearn?
>>
>>
>> Thanks for your help,
>> Best regards,
>> Ben
>>
>> 
>>
>> Here is the code I used to generate synthetic data:
>>
>> from sklearn.datasets import make_classification
>> from sklearn.model_selection import StratifiedShuffleSplit
>> from sklearn.preprocessing import StandardScaler
>> from sklearn.linear_model import LogisticRegression
>> import statsmodels.api as sm
>> #
>> RANDOM_SEED = 2
>> #
>> X_sim, y_sim = make_classification(n_samples=200,
>> ?????????????????????????? n_features=20,
>> ?????????????????????????? n_informative=10,
>> ?????????????????????????? n_redundant=0,
>> ?????????????????????????? n_repeated=0,
>> ?????????????????????????? n_classes=2,
>> ?????????????????????????? n_clusters_per_class=1,
>> ?????????????????????????? random_state=RANDOM_SEED,
>> ?????????????????????????? shuffle=False)
>> #
>> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
>> random_state=RANDOM_SEED)
>> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
>> ??? X_split_train, X_split_test = X_sim[train_index_split],
>> X_sim[test_index_split]
>> ??? y_split_train, y_split_test = y_sim[train_index_split],
>> y_sim[test_index_split]
>> ??? ss = StandardScaler()
>> ??? X_split_train = ss.fit_transform(X_split_train)
>> ??? X_split_test = ss.transform(X_split_test)
>> ??? #
>> ??? classifier_lbfgs = LogisticRegression(fit_intercept=True,
>> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>> ??????????????????????????????????? solver='lbfgs',
>> penalty='none', tol=1e6)
>> ??? classifier_lbfgs.fit(X_split_train, y_split_train)
>> ??? print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
>> ??? print(classifier_lbfgs.intercept_)
>> ??? print(classifier_lbfgs.coef_)
>> ??? #
>> ??? classifier_saga = LogisticRegression(fit_intercept=True,
>> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>> ??????????????????????????????????? solver='saga',
>> penalty='none', tol=1e6)
>> ??? classifier_saga.fit(X_split_train, y_split_train)
>> ??? print('classifier saga iter:', classifier_saga.n_iter_)
>> ??? print(classifier_saga.intercept_)
>> ??? print(classifier_saga.coef_)
>> ??? #
>> ??? classifier_liblinear = LogisticRegression(fit_intercept=True,
>> max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
>> ???????????????????????????????????????? C=1e9,
>> solver='liblinear', penalty='l2', tol=1e6)
>> ??? classifier_liblinear.fit(X_split_train, y_split_train)
>> ??? print('classifier liblinear iter:', classifier_liblinear.n_iter_)
>> ??? print(classifier_liblinear.intercept_)
>> ??? print(classifier_liblinear.coef_)
>> ??? # statsmodels
>> ??? logit = sm.Logit(y_split_train,
>> sm.tools.add_constant(X_split_train))
>> ??? logit_res = logit.fit(maxiter=20000)
>> ??? print("Coef statsmodels")
>> ??? print(logit_res.params)
>>
>>
>>
>> On 11/10/2019 15:42, Andreas Mueller wrote:
>>>
>>>
>>> On 10/10/19 1:14 PM, Beno?t Presles wrote:
>>>>
>>>> Thanks for your answers.
>>>>
>>>> On my real data, I do not have so many samples. I have a bit
>>>> more than 200 samples in total and I also would like to get
>>>> some results with unpenalized logisitic regression.
>>>> What do you suggest? Should I switch to the lbfgs solver?
>>> Yes.
>>>> Am I sure that with this solver I will not have any convergence
>>>> issue and always get the good result? Indeed, I did not get any
>>>> convergence warning with saga, so I thought everything was
>>>> fine. I noticed some issues only when I decided to test several
>>>> solvers. Without comparing the results across solvers, how to
>>>> be sure that the optimisation goes well? Shouldn't scikitlearn
>>>> warn the user somehow if it is not the case?
>>> We should attempt to warn in the SAGA solver if it doesn't
>>> converge. That it doesn't raise a convergence warning should
>>> probably be considered a bug.
>>> It uses the maximum weight change as a stopping criterion right now.
>>> We could probably compute the dual objective once in the end to
>>> see if we converged, right? Or is that not possible with SAGA?
>>> If not, we might want to caution that no convergence warning
>>> will be raised.
>>>
>>>>
>>>> At last, I was using saga because I also wanted to do some
>>>> feature selection by using l1 penalty which is not supported by
>>>> lbfgs...
>>> You can use liblinear then.
>>>
>>>
>>>>
>>>> Best regards,
>>>> Ben
>>>>
>>>>
>>>> Le 09/10/2019 ? 23:39, Guillaume Lema?tre a ?crit?:
>>>>> Ups I did not see the answer of Roman. Sorry about that. It is
>>>>> coming back to the same conclusion :)
>>>>>
>>>>> On Wed, 9 Oct 2019 at 23:37, Guillaume Lema?tre
>>>>> > wrote:
>>>>>
>>>>> Uhm actually increasing to 10000 samples solve the
>>>>> convergence issue.
>>>>> SAGA is not designed to work with a so small sample size
>>>>> most probably.
>>>>>
>>>>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lema?tre
>>>>> >
>>>>> wrote:
>>>>>
>>>>> I slightly change the bench such that it uses pipeline
>>>>> and plotted the coefficient:
>>>>>
>>>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>>>
>>>>> I only see one of the 10 splits where SAGA is not
>>>>> converging, otherwise the coefficients
>>>>> look very close (I don't attach the figure here but
>>>>> they can be plotted using the snippet).
>>>>> So apart from this second split, the other differences
>>>>> seems to be numerical instability.
>>>>>
>>>>> Where I have some concern is regarding the convergence
>>>>> rate of SAGA but I have no
>>>>> intuition to know if this is normal or not.
>>>>>
>>>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>>> >
>>>>> wrote:
>>>>>
>>>>> Ben,
>>>>>
>>>>> I can confirm your results with penalty='none' and
>>>>> C=1e9. In both cases,
>>>>> you are running a mostly unpenalized logisitic
>>>>> regression. Usually
>>>>> that's less numerically stable than with a small
>>>>> regularization,
>>>>> depending on the data collinearity.
>>>>>
>>>>> Running that same code with
>>>>> ?  larger penalty ( smaller C values)
>>>>> ?  or larger number of samples
>>>>> ? yields for me the same coefficients (up to some
>>>>> tolerance).
>>>>>
>>>>> You can also see that SAGA convergence is not good
>>>>> by the fact that it
>>>>> needs 196000 epochs/iterations to converge.
>>>>>
>>>>> Actually, I have often seen convergence issues
>>>>> with SAG on small
>>>>> datasets (in unit tests), not fully sure why.
>>>>>
>>>>> 
>>>>> Roman
>>>>>
>>>>> On 09/10/2019 22:10, serafim loukas wrote:
>>>>> > The predictions across solver are exactly the
>>>>> same when I run the code.
>>>>> > I am using 0.21.3 version. What is yours?
>>>>> >
>>>>> >
>>>>> > In [13]: import sklearn
>>>>> >
>>>>> > In [14]: sklearn.__version__
>>>>> > Out[14]: '0.21.3'
>>>>> >
>>>>> >
>>>>> > Serafeim
>>>>> >
>>>>> >
>>>>> >
>>>>> >> On 9 Oct 2019, at 21:44, Beno?t Presles
>>>>> >>>>
>>>>> >> >>>> >> wrote:
>>>>> >>
>>>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > scikitlearn mailing list
>>>>> > scikitlearn at python.org
>>>>>
>>>>> >
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>>
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>
>>>>>
>>>>>
>>>>> 
>>>>> Guillaume Lemaitre
>>>>> Scikitlearn @ Inria Foundation
>>>>> https://glemaitre.github.io/
>>>>>
>>>>>
>>>>>
>>>>> 
>>>>> Guillaume Lemaitre
>>>>> Scikitlearn @ Inria Foundation
>>>>> https://glemaitre.github.io/
>>>>>
>>>>>
>>>>>
>>>>> 
>>>>> Guillaume Lemaitre
>>>>> Scikitlearn @ Inria Foundation
>>>>> https://glemaitre.github.io/
>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>>
>>
>> 
>> Guillaume Lemaitre
>> Scikitlearn @ Inria Foundation
>> https://glemaitre.github.io/
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From pahome.chen at mirlab.org Wed Jan 8 21:22:38 2020
From: pahome.chen at mirlab.org (lampahome)
Date: Thu, 9 Jan 2020 10:22:38 +0800
Subject: [scikitlearn] Why ridge regression can solve multicollinearity?
MessageID:
I find out many blogs said that the l2 regularization solve
multicollinearity, but they don't said how it works.
I thought LASSO is able to select features by l1 regularization, maybe it
also can solve this.
Can anyone tell me how ridge works with multicollinearity great?
thx
 next part 
An HTML attachment was scrubbed...
URL:
From stuart at stuartreynolds.net Wed Jan 8 21:31:23 2020
From: stuart at stuartreynolds.net (Stuart Reynolds)
Date: Wed, 8 Jan 2020 18:31:23 0800
Subject: [scikitlearn] Why ridge regression can solve multicollinearity?
InReplyTo:
References:
MessageID:
Correlated features typically have the property that they are tending to be
similarly predictive of the outcome.
L1 and L2 are both a preference for low coefficients.
If a coefficient can be reduced yet another coefficient maintains similar
loss, the these regularization methods prefer this solution.
If you use L1 or L2, you should mean and variance normalize your features.
On Wed, Jan 8, 2020 at 6:24 PM lampahome wrote:
> I find out many blogs said that the l2 regularization solve
> multicollinearity, but they don't said how it works.
>
> I thought LASSO is able to select features by l1 regularization, maybe it
> also can solve this.
>
> Can anyone tell me how ridge works with multicollinearity great?
>
> thx
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From pahome.chen at mirlab.org Wed Jan 8 21:38:02 2020
From: pahome.chen at mirlab.org (lampahome)
Date: Thu, 9 Jan 2020 10:38:02 +0800
Subject: [scikitlearn] Why ridge regression can solve multicollinearity?
InReplyTo:
References:
MessageID:
Stuart Reynolds ? 2020?1?9? ?? ??10:33???
> Correlated features typically have the property that they are tending to
> be similarly predictive of the outcome.
>
> L1 and L2 are both a preference for low coefficients.
> If a coefficient can be reduced yet another coefficient maintains similar
> loss, the these regularization methods prefer this solution.
> If you use L1 or L2, you should mean and variance normalize your features.
>
>
You mean LASSO and RIDGE both solve multilinearity?
 next part 
An HTML attachment was scrubbed...
URL:
From josef.pktd at gmail.com Wed Jan 8 21:43:54 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 8 Jan 2020 21:43:54 0500
Subject: [scikitlearn] Why ridge regression can solve multicollinearity?
InReplyTo:
References:
MessageID:
On Wed, Jan 8, 2020 at 9:38 PM lampahome wrote:
>
>
> Stuart Reynolds ? 2020?1?9? ?? ??10:33???
>
>> Correlated features typically have the property that they are tending to
>> be similarly predictive of the outcome.
>>
>> L1 and L2 are both a preference for low coefficients.
>> If a coefficient can be reduced yet another coefficient maintains similar
>> loss, the these regularization methods prefer this solution.
>> If you use L1 or L2, you should mean and variance normalize your features.
>>
>>
> You mean LASSO and RIDGE both solve multilinearity?
>
LASSO has the reputation not to be good when there is multicollinearity,
that's why elastic net L1 + L2 was introduced, AFAIK
With multicollinearity the length of the parameter vector, beta' beta, is
too large and L2, Ridge shrinks it.
Josef
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From josef.pktd at gmail.com Wed Jan 8 21:47:01 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 8 Jan 2020 21:47:01 0500
Subject: [scikitlearn] Why ridge regression can solve multicollinearity?
InReplyTo:
References:
MessageID:
On Wed, Jan 8, 2020 at 9:43 PM wrote:
>
>
> On Wed, Jan 8, 2020 at 9:38 PM lampahome wrote:
>
>>
>>
>> Stuart Reynolds ? 2020?1?9? ?? ??10:33???
>>
>>> Correlated features typically have the property that they are tending to
>>> be similarly predictive of the outcome.
>>>
>>> L1 and L2 are both a preference for low coefficients.
>>> If a coefficient can be reduced yet another coefficient maintains
>>> similar loss, the these regularization methods prefer this solution.
>>> If you use L1 or L2, you should mean and variance normalize your
>>> features.
>>>
>>>
>> You mean LASSO and RIDGE both solve multilinearity?
>>
>
> LASSO has the reputation not to be good when there is multicollinearity,
> that's why elastic net L1 + L2 was introduced, AFAIK
>
> With multicollinearity the length of the parameter vector, beta' beta, is
> too large and L2, Ridge shrinks it.
>
e.g.
Marquardt, Donald W., and Ronald D. Snee. "Ridge regression in practice." *The
American Statistician* 29, no. 1 (1975): 320.
I just went through it last week because of a argument about variance
inflation factor in Ridge
>
> Josef
>
>
>
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>
 next part 
An HTML attachment was scrubbed...
URL:
From jbbrown at kuhp.kyotou.ac.jp Wed Jan 8 23:54:15 2020
From: jbbrown at kuhp.kyotou.ac.jp (Brown J.B.)
Date: Thu, 9 Jan 2020 13:54:15 +0900
Subject: [scikitlearn] Why ridge regression can solve multicollinearity?
InReplyTo:
References:
MessageID:
Just for convenience:
Marquardt, Donald W., and Ronald D. Snee. "Ridge regression in practice." *The
> American Statistician* 29, no. 1 (1975): 320.
>
https://amstat.tandfonline.com/doi/abs/10.1080/00031305.1975.10479105
 next part 
An HTML attachment was scrubbed...
URL:
From adityaselfefficient at gmail.com Thu Jan 9 01:22:06 2020
From: adityaselfefficient at gmail.com (aditya aggarwal)
Date: Thu, 9 Jan 2020 11:52:06 +0530
Subject: [scikitlearn] Changes in the classifier
MessageID:
Hello
I'm trying to change the entropy function which is used in sklearn for
DecisionTreeClassification locally on my system.
when I rerun the pip install editable . command after updating the cython
file, I receive the following error message:
Error compiling Cython file:

...
for k in range(self.n_outputs):
for c in range(n_classes[k]):
count_k = sum_total[c]
if count_k > 0.0:
count_k /= self.weighted_n_node_samples
entropy = count_k * np.log2(count_k)
^

sklearn/tree/_criterion.pyx:537:20: Coercion from Python not allowed
without the GIL
This error is persisten with other errors as:
Operation not allowed without gil
Converting to Python object not allowed without gil
Converting to Python object not allowed without gil
Calling gilrequiring function not allowed without gil
Accessing Python attribute not allowed without gil
Accessing Python global or builtin not allowed without gil
I've tried looking up for solution on various sites, but could not resolve
the issue.
Any help would be appreciated.
Thanks and regards
Aditya Aggarwal
 next part 
An HTML attachment was scrubbed...
URL:
From adrin.jalali at gmail.com Thu Jan 9 04:09:22 2020
From: adrin.jalali at gmail.com (Adrin)
Date: Thu, 9 Jan 2020 10:09:22 +0100
Subject: [scikitlearn] Changes in the classifier
InReplyTo:
References:
MessageID:
Outside GIL, you can't really work easily with Python objects. You should
instead stick to local C variables and routines. For instance, instead of
numpy routines, you can use cmath routines.
The cython book (
https://www.amazon.com/CythonProgrammersKurtWSmith/dp/1491901551)
and Nicolas's post (http://nicolashug.com/blog/cython_notes) may give you
some hints.
On Thu, Jan 9, 2020 at 7:24 AM aditya aggarwal <
adityaselfefficient at gmail.com> wrote:
> Hello
>
> I'm trying to change the entropy function which is used in sklearn for
> DecisionTreeClassification locally on my system.
> when I rerun the pip install editable . command after updating the
> cython file, I receive the following error message:
>
> Error compiling Cython file:
> 
> ...
> for k in range(self.n_outputs):
> for c in range(n_classes[k]):
> count_k = sum_total[c]
> if count_k > 0.0:
> count_k /= self.weighted_n_node_samples
> entropy = count_k * np.log2(count_k)
> ^
> 
>
> sklearn/tree/_criterion.pyx:537:20: Coercion from Python not allowed
> without the GIL
>
> This error is persisten with other errors as:
>
> Operation not allowed without gil
> Converting to Python object not allowed without gil
> Converting to Python object not allowed without gil
> Calling gilrequiring function not allowed without gil
> Accessing Python attribute not allowed without gil
> Accessing Python global or builtin not allowed without gil
>
>
> I've tried looking up for solution on various sites, but could not resolve
> the issue.
> Any help would be appreciated.
>
> Thanks and regards
> Aditya Aggarwal
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From benoit.presles at ubourgogne.fr Thu Jan 9 09:22:37 2020
From: benoit.presles at ubourgogne.fr (=?UTF8?Q?Beno=c3=aet_Presles?=)
Date: Thu, 9 Jan 2020 15:22:37 +0100
Subject: [scikitlearn] logistic regression results are not stable
between solvers
InReplyTo: <63a991901a1e059dba16a8fcc25a1dc5@gmail.com>
References:
<5591ab4c6a152910c5920c019b1a6600@ubourgogne.fr>
<44B72247308C42A4B4E1DFD1BDFC5058@hotmail.com>
<586c60249bef3ab8513d547913808039@gmail.com>
<4d4dc37ded57b512fcdf45693ff9e489@ubourgogne.fr>
<9c18b18c37992da6ec05b9144aa2557a@ubourgogne.fr>
<63a991901a1e059dba16a8fcc25a1dc5@gmail.com>
MessageID: <10a6a7c14bf56e1ed352293bd1c9ea1d@ubourgogne.fr>
Hi Andy,
As you can notice in the code, I fixed C=1e9, so the intercept with
liblinear is not penalised and therefore I get the same solutions with
these solvers when everything goes well.
How can I check the objective of the lbfgs and liblinear solvers with
sklearn?
Best regards,
Ben
On 08/01/2020 21:53, Andreas Mueller wrote:
> Hi Ben.
>
> Liblinear and lbfgs might both converge but to different solutions,
> given that the intercept is penalized.
> There is also problems with illconditioned problems that are hard to
> detect.
> My impression of SAGA was that the convergence checks are too loose
> and we should improve them.
> Have you checked the objective of the lbfgs and liblinear solvers?
> With illconditioned data the objectives could be similar with
> different solutions.
>
> It's not intended for scikitlearn to warn about illconditioned
> problems, I think, only convergence issues.
>
> Hth,
> Andy
>
>
> On 1/8/20 3:31 PM, Beno?t Presles wrote:
>> With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear
>> n_iter_ = 64.
>>
>>
>> On 08/01/2020 21:18, Guillaume Lema?tre wrote:
>>> We issue convergence warning. Can you check n_iter to be sure that
>>> you did not convergence to the stated convergence?
>>>
>>> On Wed, 8 Jan 2020 at 20:53, Beno?t Presles
>>> >> > wrote:
>>>
>>> Dear sklearn users,
>>>
>>> I still have some issues concerning logistic regression.
>>> I did compare on the same data (simulated data) sklearn with
>>> three different solvers (lbfgs, saga, liblinear) and statsmodels.
>>>
>>> When everything goes well, I get the same results between lbfgs,
>>> saga, liblinear and statsmodels. When everything goes wrong, all
>>> the results are different.
>>>
>>> In fact, when everything goes wrong, statsmodels gives me a
>>> convergence warning (Warning: Maximum number of iterations has
>>> been exceeded. Current function value: inf Iterations: 20000) +
>>> an error (numpy.linalg.LinAlgError: Singular matrix).
>>>
>>> Why sklearn does not tell me anything? How can I know that I
>>> have convergence issues with sklearn?
>>>
>>>
>>> Thanks for your help,
>>> Best regards,
>>> Ben
>>>
>>> 
>>>
>>> Here is the code I used to generate synthetic data:
>>>
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import StratifiedShuffleSplit
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.linear_model import LogisticRegression
>>> import statsmodels.api as sm
>>> #
>>> RANDOM_SEED = 2
>>> #
>>> X_sim, y_sim = make_classification(n_samples=200,
>>> ?????????????????????????? n_features=20,
>>> ?????????????????????????? n_informative=10,
>>> ?????????????????????????? n_redundant=0,
>>> ?????????????????????????? n_repeated=0,
>>> ?????????????????????????? n_classes=2,
>>> ?????????????????????????? n_clusters_per_class=1,
>>> ?????????????????????????? random_state=RANDOM_SEED,
>>> ?????????????????????????? shuffle=False)
>>> #
>>> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
>>> random_state=RANDOM_SEED)
>>> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
>>> ??? X_split_train, X_split_test = X_sim[train_index_split],
>>> X_sim[test_index_split]
>>> ??? y_split_train, y_split_test = y_sim[train_index_split],
>>> y_sim[test_index_split]
>>> ??? ss = StandardScaler()
>>> ??? X_split_train = ss.fit_transform(X_split_train)
>>> ??? X_split_test = ss.transform(X_split_test)
>>> ??? #
>>> ??? classifier_lbfgs = LogisticRegression(fit_intercept=True,
>>> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>>> ??????????????????????????????????? solver='lbfgs',
>>> penalty='none', tol=1e6)
>>> ??? classifier_lbfgs.fit(X_split_train, y_split_train)
>>> ??? print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
>>> ??? print(classifier_lbfgs.intercept_)
>>> ??? print(classifier_lbfgs.coef_)
>>> ??? #
>>> ??? classifier_saga = LogisticRegression(fit_intercept=True,
>>> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>>> ??????????????????????????????????? solver='saga',
>>> penalty='none', tol=1e6)
>>> ??? classifier_saga.fit(X_split_train, y_split_train)
>>> ??? print('classifier saga iter:', classifier_saga.n_iter_)
>>> ??? print(classifier_saga.intercept_)
>>> ??? print(classifier_saga.coef_)
>>> ??? #
>>> ??? classifier_liblinear =
>>> LogisticRegression(fit_intercept=True, max_iter=20000000,
>>> verbose=0, random_state=RANDOM_SEED,
>>> ???????????????????????????????????????? C=1e9,
>>> solver='liblinear', penalty='l2', tol=1e6)
>>> ??? classifier_liblinear.fit(X_split_train, y_split_train)
>>> ??? print('classifier liblinear iter:',
>>> classifier_liblinear.n_iter_)
>>> ??? print(classifier_liblinear.intercept_)
>>> ??? print(classifier_liblinear.coef_)
>>> ??? # statsmodels
>>> ??? logit = sm.Logit(y_split_train,
>>> sm.tools.add_constant(X_split_train))
>>> ??? logit_res = logit.fit(maxiter=20000)
>>> ??? print("Coef statsmodels")
>>> ??? print(logit_res.params)
>>>
>>>
>>>
>>> On 11/10/2019 15:42, Andreas Mueller wrote:
>>>>
>>>>
>>>> On 10/10/19 1:14 PM, Beno?t Presles wrote:
>>>>>
>>>>> Thanks for your answers.
>>>>>
>>>>> On my real data, I do not have so many samples. I have a bit
>>>>> more than 200 samples in total and I also would like to get
>>>>> some results with unpenalized logisitic regression.
>>>>> What do you suggest? Should I switch to the lbfgs solver?
>>>> Yes.
>>>>> Am I sure that with this solver I will not have any
>>>>> convergence issue and always get the good result? Indeed, I
>>>>> did not get any convergence warning with saga, so I thought
>>>>> everything was fine. I noticed some issues only when I decided
>>>>> to test several solvers. Without comparing the results across
>>>>> solvers, how to be sure that the optimisation goes well?
>>>>> Shouldn't scikitlearn warn the user somehow if it is not the
>>>>> case?
>>>> We should attempt to warn in the SAGA solver if it doesn't
>>>> converge. That it doesn't raise a convergence warning should
>>>> probably be considered a bug.
>>>> It uses the maximum weight change as a stopping criterion right
>>>> now.
>>>> We could probably compute the dual objective once in the end to
>>>> see if we converged, right? Or is that not possible with SAGA?
>>>> If not, we might want to caution that no convergence warning
>>>> will be raised.
>>>>
>>>>>
>>>>> At last, I was using saga because I also wanted to do some
>>>>> feature selection by using l1 penalty which is not supported
>>>>> by lbfgs...
>>>> You can use liblinear then.
>>>>
>>>>
>>>>>
>>>>> Best regards,
>>>>> Ben
>>>>>
>>>>>
>>>>> Le 09/10/2019 ? 23:39, Guillaume Lema?tre a ?crit?:
>>>>>> Ups I did not see the answer of Roman. Sorry about that. It
>>>>>> is coming back to the same conclusion :)
>>>>>>
>>>>>> On Wed, 9 Oct 2019 at 23:37, Guillaume Lema?tre
>>>>>> > wrote:
>>>>>>
>>>>>> Uhm actually increasing to 10000 samples solve the
>>>>>> convergence issue.
>>>>>> SAGA is not designed to work with a so small sample size
>>>>>> most probably.
>>>>>>
>>>>>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lema?tre
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>> I slightly change the bench such that it uses
>>>>>> pipeline and plotted the coefficient:
>>>>>>
>>>>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>>>>
>>>>>> I only see one of the 10 splits where SAGA is not
>>>>>> converging, otherwise the coefficients
>>>>>> look very close (I don't attach the figure here but
>>>>>> they can be plotted using the snippet).
>>>>>> So apart from this second split, the other
>>>>>> differences seems to be numerical instability.
>>>>>>
>>>>>> Where I have some concern is regarding the
>>>>>> convergence rate of SAGA but I have no
>>>>>> intuition to know if this is normal or not.
>>>>>>
>>>>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>>>> >>>>> > wrote:
>>>>>>
>>>>>> Ben,
>>>>>>
>>>>>> I can confirm your results with penalty='none'
>>>>>> and C=1e9. In both cases,
>>>>>> you are running a mostly unpenalized logisitic
>>>>>> regression. Usually
>>>>>> that's less numerically stable than with a small
>>>>>> regularization,
>>>>>> depending on the data collinearity.
>>>>>>
>>>>>> Running that same code with
>>>>>> ?  larger penalty ( smaller C values)
>>>>>> ?  or larger number of samples
>>>>>> ? yields for me the same coefficients (up to some
>>>>>> tolerance).
>>>>>>
>>>>>> You can also see that SAGA convergence is not
>>>>>> good by the fact that it
>>>>>> needs 196000 epochs/iterations to converge.
>>>>>>
>>>>>> Actually, I have often seen convergence issues
>>>>>> with SAG on small
>>>>>> datasets (in unit tests), not fully sure why.
>>>>>>
>>>>>> 
>>>>>> Roman
>>>>>>
>>>>>> On 09/10/2019 22:10, serafim loukas wrote:
>>>>>> > The predictions across solver are exactly the
>>>>>> same when I run the code.
>>>>>> > I am using 0.21.3 version. What is yours?
>>>>>> >
>>>>>> >
>>>>>> > In [13]: import sklearn
>>>>>> >
>>>>>> > In [14]: sklearn.__version__
>>>>>> > Out[14]: '0.21.3'
>>>>>> >
>>>>>> >
>>>>>> > Serafeim
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >> On 9 Oct 2019, at 21:44, Beno?t Presles
>>>>>> >>>>>
>>>>>> >> >>>>> >> wrote:
>>>>>> >>
>>>>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > scikitlearn mailing list
>>>>>> > scikitlearn at python.org
>>>>>>
>>>>>> >
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>> >
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>>
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>
>>>>>>
>>>>>>
>>>>>> 
>>>>>> Guillaume Lemaitre
>>>>>> Scikitlearn @ Inria Foundation
>>>>>> https://glemaitre.github.io/
>>>>>>
>>>>>>
>>>>>>
>>>>>> 
>>>>>> Guillaume Lemaitre
>>>>>> Scikitlearn @ Inria Foundation
>>>>>> https://glemaitre.github.io/
>>>>>>
>>>>>>
>>>>>>
>>>>>> 
>>>>>> Guillaume Lemaitre
>>>>>> Scikitlearn @ Inria Foundation
>>>>>> https://glemaitre.github.io/
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>>>
>>>
>>> 
>>> Guillaume Lemaitre
>>> Scikitlearn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From adityaselfefficient at gmail.com Mon Jan 13 06:11:46 2020
From: adityaselfefficient at gmail.com (aditya aggarwal)
Date: Mon, 13 Jan 2020 16:41:46 +0530
Subject: [scikitlearn] Changes in the classifier
InReplyTo:
References:
MessageID:
I just want to change the log function while calculating entropy, how can I
do this in the scikit library?
On Thu, Jan 9, 2020 at 2:41 PM Adrin wrote:
> Outside GIL, you can't really work easily with Python objects. You should
> instead stick to local C variables and routines. For instance, instead of
> numpy routines, you can use cmath routines.
>
> The cython book (
> https://www.amazon.com/CythonProgrammersKurtWSmith/dp/1491901551)
> and Nicolas's post (http://nicolashug.com/blog/cython_notes) may give
> you some hints.
>
> On Thu, Jan 9, 2020 at 7:24 AM aditya aggarwal <
> adityaselfefficient at gmail.com> wrote:
>
>> Hello
>>
>> I'm trying to change the entropy function which is used in sklearn for
>> DecisionTreeClassification locally on my system.
>> when I rerun the pip install editable . command after updating the
>> cython file, I receive the following error message:
>>
>> Error compiling Cython file:
>> 
>> ...
>> for k in range(self.n_outputs):
>> for c in range(n_classes[k]):
>> count_k = sum_total[c]
>> if count_k > 0.0:
>> count_k /= self.weighted_n_node_samples
>> entropy = count_k * np.log2(count_k)
>> ^
>> 
>>
>> sklearn/tree/_criterion.pyx:537:20: Coercion from Python not allowed
>> without the GIL
>>
>> This error is persisten with other errors as:
>>
>> Operation not allowed without gil
>> Converting to Python object not allowed without gil
>> Converting to Python object not allowed without gil
>> Calling gilrequiring function not allowed without gil
>> Accessing Python attribute not allowed without gil
>> Accessing Python global or builtin not allowed without gil
>>
>>
>> I've tried looking up for solution on various sites, but could not
>> resolve the issue.
>> Any help would be appreciated.
>>
>> Thanks and regards
>> Aditya Aggarwal
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From adityaselfefficient at gmail.com Tue Jan 14 06:00:06 2020
From: adityaselfefficient at gmail.com (aditya aggarwal)
Date: Tue, 14 Jan 2020 16:30:06 +0530
Subject: [scikitlearn] Decision tree call chronology
MessageID:
Hello
I am trying to understand the order of functions call for performing
classification using decision tree in sklearn. I need to make and test some
changes in the algorithm used to calculate best split for my dissertation.
I have looked up the documentation available of sklearn and other sources
available online but couldn't seem to crack it. Any help would be
appreciated.
Thanks
 next part 
An HTML attachment was scrubbed...
URL:
From niourf at gmail.com Tue Jan 14 06:35:24 2020
From: niourf at gmail.com (Nicolas Hug)
Date: Tue, 14 Jan 2020 06:35:24 0500
Subject: [scikitlearn] Decision tree call chronology
InReplyTo:
References:
MessageID:
Hi Aditya,
It's hard for us to answer without any specific question. Perhaps this
will help:
https://scikitlearn.org/stable/developers/contributing.html#readingtheexistingcodebase
The tree code is quite complex, because it is very generic and can
support many different settings (multioutput, sparse data, etc) as well
as many different parameters like max_features, splitter, presort... I
would suggest being familiar with the different parameters before diving
into the code.
Nicolas
On 1/14/20 6:00 AM, aditya aggarwal wrote:
> Hello
>
> I am trying to understand the order of functions call for performing
> classification using decision tree in sklearn. I need to make and test
> some changes in the algorithm used to calculate best split for my
> dissertation. I have looked up the documentation available of sklearn
> and other sources available online but couldn't seem to crack it. Any
> help would be appreciated.
>
> Thanks
>
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From fad469 at uregina.ca Tue Jan 14 10:28:48 2020
From: fad469 at uregina.ca (Farzana Anowar)
Date: Tue, 14 Jan 2020 09:28:48 0600
Subject: [scikitlearn] Attribute Incremental learning
MessageID: <48df7513ce06b5e21e112c83211c014f@uregina.ca>
Hello,
This is Farzana. I am trying to understand the attribute incremental
learning ( or virtual concept drift) which is every time when a new
feature will be available for a realtime dataset (i.e. any online
auction dataset) a classifier will add that new feature with the
existing features in a dataset and classify the new dataset (with
previous features and new features) incrementally. I know that we can
convert a static classifier to an incremental classifier in
scikitlearn. However, I could not find any library or function for
attribute incremental learning or any detail information. It would be
great if anyone could give me some insight on this.
Thanks!

Best Regards,
Farzana Anowar,
PhD Candidate
Department of Computer Science
University of Regina
From niourf at gmail.com Wed Jan 15 06:45:04 2020
From: niourf at gmail.com (Nicolas Hug)
Date: Wed, 15 Jan 2020 06:45:04 0500
Subject: [scikitlearn] Issues for Berlin and Paris Sprints
InReplyTo:
References:
MessageID:
Hi Chiara,
Thanks for taking care of this
> have a list of two/three reviewers available to check on a specific issue
That might not be tractable in practice because we have a bunch of
"bulk" issues involving many PRs, e.g. the issues about updating the
random_state docs everywhere. But assigning reviewers to PRs should be
feasible, let's try that.
> a bit uncomfortable in pinging coredevs randomly
FWIW, feel free to ping and assign me to PRs (not just sprint PRs)
Nicolas
On 1/6/20 5:13 AM, Chiara Marmo wrote:
> Dear coredevs,
>
> First let me wish a Happy New Year to you all!
>
> There will be two scikitlearn sprints in January to start this 2020
> in a busy way: one in Berlin [1] (Jan 25) and one in Paris [2] (Jan
> 2831).
> I feel like we could benefit of some coordination in selecting the
> issues for those two events.
> Reshama Shaikh and I, we are already in touch.
>
> I've opened two projects [3][4] to followup the issue selection for
> the sprints.
>
> I will check for previous "Sprint" labels in the skl issues and maybe
> ask for clarification on some of them... please, be patient.
> The goal is to prepare the two sprints in order to make the review
> process as efficient as possible: we don't want to waste the reviewer
> time and we hope to make the PR experience a learning opportunity on
> both sides.
>
> In particular, I would like to ask a favour to all of you: I don't
> know if this is even always possible, but, IMO, it would be really
> useful to have a list of two/three reviewers available to check on a
> specific issue. I am, personally, a bit uncomfortable in pinging
> coredevs randomly, under the impression of crying wolf lacking for
> attention... If people in charge are defined in advance this could, I
> think, smooth the review process. What do you think?
>
> Please, let us know if you have any suggestion or recommendation to
> improve the Sprint organization.
>
> Thanks for listening,
> Best,
> Chiara
>
> [1] https://github.com/WiMLDS/berlin2020scikitsprint
> [2]
> https://github.com/scikitlearn/scikitlearn/wiki/ParisscikitlearnSprintoftheDecade
> [3] https://github.com/WiMLDS/berlin2020scikitsprint/projects/1
> [4]
> https://github.com/scikitlearnfondation/ParisSprintJanuary2020/projects/1
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From maxhalford25 at gmail.com Thu Jan 16 09:36:33 2020
From: maxhalford25 at gmail.com (Max Halford)
Date: Thu, 16 Jan 2020 15:36:33 +0100
Subject: [scikitlearn] Attribute Incremental learning
InReplyTo: <48df7513ce06b5e21e112c83211c014f@uregina.ca>
References: <48df7513ce06b5e21e112c83211c014f@uregina.ca>
MessageID:
Hello Farzana,
You might want to check out scikitmultiflow
and creme
(I'm the author).
Kind regards.
On Tue, 14 Jan 2020 at 16:59, Farzana Anowar wrote:
> Hello,
>
> This is Farzana. I am trying to understand the attribute incremental
> learning ( or virtual concept drift) which is every time when a new
> feature will be available for a realtime dataset (i.e. any online
> auction dataset) a classifier will add that new feature with the
> existing features in a dataset and classify the new dataset (with
> previous features and new features) incrementally. I know that we can
> convert a static classifier to an incremental classifier in
> scikitlearn. However, I could not find any library or function for
> attribute incremental learning or any detail information. It would be
> great if anyone could give me some insight on this.
>
> Thanks!
> 
> Best Regards,
>
> Farzana Anowar,
> PhD Candidate
> Department of Computer Science
> University of Regina
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>

Max Halford
+336 28 25 13 38
 next part 
An HTML attachment was scrubbed...
URL:
From fad469 at uregina.ca Thu Jan 16 10:00:05 2020
From: fad469 at uregina.ca (Farzana Anowar)
Date: Thu, 16 Jan 2020 09:00:05 0600
Subject: [scikitlearn] Attribute Incremental learning
InReplyTo:
References: <48df7513ce06b5e21e112c83211c014f@uregina.ca>
MessageID: <0aa41d66b218f025a6ee904f6d866d72@uregina.ca>
On 20200116 08:36, Max Halford wrote:
> Hello Farzana,
>
> You might want to check out scikitmultiflow [1] and creme [2] (I'm
> the author).
>
> Kind regards.
>
> On Tue, 14 Jan 2020 at 16:59, Farzana Anowar
> wrote:
>
>> Hello,
>>
>> This is Farzana. I am trying to understand the attribute incremental
>>
>> learning ( or virtual concept drift) which is every time when a new
>> feature will be available for a realtime dataset (i.e. any online
>> auction dataset) a classifier will add that new feature with the
>> existing features in a dataset and classify the new dataset (with
>> previous features and new features) incrementally. I know that we
>> can
>> convert a static classifier to an incremental classifier in
>> scikitlearn. However, I could not find any library or function for
>> attribute incremental learning or any detail information. It would
>> be
>> great if anyone could give me some insight on this.
>>
>> Thanks!
>> 
>> Best Regards,
>>
>> Farzana Anowar,
>> PhD Candidate
>> Department of Computer Science
>> University of Regina
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>
> 
>
> Max Halford
>
> +336 28 25 13 38
>
> Links:
> 
> [1] https://scikitmultiflow.github.io/
> [2] https://crememl.github.io/
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
Hello Max,
Thanks a lot.

Best Regards,
Farzana Anowar,
PhD Candidate
Department of Computer Science
University of Regina
From garyfallidis at gmail.com Wed Jan 15 13:12:32 2020
From: garyfallidis at gmail.com (Eleftherios Garyfallidis)
Date: Wed, 15 Jan 2020 13:12:32 0500
Subject: [scikitlearn] ANN: DIPY 1.1.1  a powerful release
MessageID:
We are excited to announce a new release of DIPY:
DIPY 1.1.1 is out! In addition:
a) A new 5 day workshop available during March 1620 to learn the theory
and applications of the hundreds of methods available in DIPY 1.1.1
Intense!
See the exquisite program here .
*b) Given the need for a myriad of new DIPY derivative projects, DIPY moved
to its own organization in GitHub. **Long live DIPY! *
*And therefore, *https://github.com/dipy/dipy* supersedes
https://github.com/nipy/dipy
The old link will be available as a redirect
link for the next 6 months.*
c) Please support us by *citing** DIPY* in your papers using the following
DOI: 10.3389/fninf.2014.00008
otherwise
the DIPY citation police will find you. ;)
DIPY 1.1.1 (Friday, 10 January 2020)
This release received contributions from 11 developers (the full release
notes are at:
https://dipy.org/documentation/1.1.1./release_notes/release1.1/). Thank you
all for your contributions and feedback!
Please click here to
check API changes.
Highlights of this release include:

New module for deep learning DIPY.NN (uses TensorFlow 2.0).

Improved DKI performance and increased utilities.

Nonlinear and RESTORE fits from DTI compatible now with DKI.

Numerical solutions for estimating axial, radial and mean kurtosis.

Added Kurtosis Fractional Anisotropy by Glenn et al. 2015.

Added Mean Kurtosis Tensor by Hansen et al. 2013.

Nibabel minimum version is 3.0.0.

Azure CI added and Appveyor CI removed.

New command line interfaces for LPCA, MPPCA and Gibbs Unringing.

New MTMS CSD tutorial added.

Horizon refactored and updated to support StatefulTractograms.

Speeded up all cython modules by using a smarter configuration setting.

All tutorials updated to API changes and 2 new tutorials added.

Large documentation update.

Closed 126 issues and merged 50 pull requests.
Note:

Have in mind that DIPY stopped supporting Python 2 after version 0.16.0.
All major Python projects have switched to Python 3. It is time that you
switch too.
To upgrade or install DIPY
Run the following command in your terminal:
pip install upgrade dipy
or
conda install c condaforge dipy
This version of DIPY depends on nibabel (3.0.0+).
For visualization you need FURY (0.4.0+).
Questions or suggestions?
For any questions go to http://dipy.org, or send an email to
dipy at python.org
We also have an instant messaging service and chat room available at
https://gitter.im/nipy/dipy
On behalf of the DIPY developers,
Eleftherios Garyfallidis, Ariel Rokem, Serge Koudoro
https://dipy.org/contributors
 next part 
An HTML attachment was scrubbed...
URL:
From marmochiaskl at gmail.com Fri Jan 17 03:45:16 2020
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Fri, 17 Jan 2020 09:45:16 +0100
Subject: [scikitlearn] Issues for Berlin and Paris Sprints
InReplyTo:
References:
MessageID:
Hi Nicolas,
thanks for your answer.
have a list of two/three reviewers available to check on a specific issue
>
> That might not be tractable in practice because we have a bunch of "bulk"
> issues involving many PRs, e.g. the issues about updating the random_state
> docs everywhere. But assigning reviewers to PRs should be feasible, let's
> try that.
>
Is that why now suggested reviewers are added to PR? I've googled and found
this
https://github.community/t5/HowtouseGitandGitHub/UsecodeownerstosuggestreviewersNOTautomaticallyassign/tdp/11503
Is that the criterion used to populate suggested reviewers?
Just trying to follow... :)
Thanks,
Chiara
 next part 
An HTML attachment was scrubbed...
URL:
From dstromberg at grokstream.com Fri Jan 17 15:38:50 2020
From: dstromberg at grokstream.com (Dan Stromberg)
Date: Fri, 17 Jan 2020 12:38:50 0800
Subject: [scikitlearn] Heisenbug?
InReplyTo:
References:
MessageID:
It's looking, at this point, like:
1) The NaN's are real
2) They're coming from some XGBoost native code, or perhaps a
Python<>native boundary, which is interfacing using ctypes.
The print's that didn't print were probably because of a misplaced flush.
The debugger that didn't debug was probably because of pytest capturing
stdout and async python code.
Thanks.
On Wed, Dec 18, 2019 at 4:09 PM Dan Stromberg
wrote:
>
> Any (further) suggestions folks?
>
> BTW, when I say pudb fails to start, I mean it's tracebacking trying to
> get None.fileno() In other pieces of (C)Python code I've tried it in,
> pudb.set_trace() worked nicely.
>
> On Tue, Dec 17, 2019 at 7:50 AM Dan Stromberg
> wrote:
>
>>
>> Hi.
>>
>> Overflow does sound kind of possible. We're sending semirandom values
>> to the test.
>>
>> I believe our systems are all x86_64, Linux. Some are Ubuntu 16.04, some
>> are Mint 19.2.
>>
>> I realized on the way to work this morning, that I left out some
>> important information; I suspect a heisenbug for 3 reasons:
>>
>> 1) If I try to look at it with print functions, I get a traceback after
>> the print's, but no print output. This happens with both writing to a
>> diskbased file, and with printing to stdout.
>>
>> 2) If I try to look at it with pudb (a debugger) via pudb.set_trace(), I
>> get a failure to start pudb.
>>
>> 3) If I create a small test program that sends the same inputs to the
>> function in question, the function works fine.
>>
>> Thanks.
>>
>> On Mon, Dec 16, 2019 at 11:20 PM Joel Nothman
>> wrote:
>>
>>> Hi Dan, this kind of error can come from overflow. Are all of your test
>>> systems the same architecture?
>>>
>>> On Tue., 17 Dec. 2019, 12:03 pm Dan Stromberg, <
>>> dstromberg at grokstream.com> wrote:
>>>
>>>> Hi folks.
>>>>
>>>> I'm new to Scikitlearn.
>>>>
>>>> I have a very large Python project that seems to have a heisenbug which
>>>> is manifesting in scikitlearn code.
>>>>
>>>> Short of constructing an SSCCE, are there any magical techniques I
>>>> should try for pinning down the precise cause? Like valgrind or something?
>>>>
>>>> An SSCCE will most likely be pretty painful: the project has copious
>>>> shared, mutable state, and I've already tried a largish test program that
>>>> calls into the same code path with the error manifesting 0 times in 100.
>>>>
>>>> It's quite possible the root cause will turn out to be some other part
>>>> of the software stack.
>>>>
>>>> The traceback from pytest looks like:
>>>> sequential/test_training.py:101:
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _
>>>> ../rt/classifier/coach.py:146: in train
>>>> **self.classifier_section
>>>> ../domain/classifier/factories/classifier_academy.py:115: in
>>>> create_classifier
>>>> **kwargs)
>>>> ../domain/classifier/factories/imp/xgb_factory.py:164: in create
>>>> clf_random.fit(X_train, y_train)
>>>> ../../../../.local/lib/python3.6/sitepackages/sklearn/model_selection/_search.py:722:
>>>> in fit
>>>> self._run_search(evaluate_candidates)
>>>> ../../../../.local/lib/python3.6/sitepackages/sklearn/model_selection/_search.py:1515:
>>>> in _run_search
>>>> random_state=self.random_state))
>>>> ../../../../.local/lib/python3.6/sitepackages/sklearn/model_selection/_search.py:711:
>>>> in evaluate_candidates
>>>> cv.split(X, y, groups)))
>>>> ../../../../.local/lib/python3.6/sitepackages/sklearn/externals/joblib/parallel.py:996:
>>>> in __call__
>>>> self.retrieve()
>>>> ../../../../.local/lib/python3.6/sitepackages/sklearn/externals/joblib/parallel.py:899:
>>>> in retrieve
>>>> self._output.extend(job.get(timeout=self.timeout))
>>>> ../../../../.local/lib/python3.6/sitepackages/sklearn/externals/joblib/_parallel_backends.py:517:
>>>> in wrap_future_result
>>>> return future.result(timeout=timeout)
>>>> /usr/lib/python3.6/concurrent/futures/_base.py:425: in result
>>>> return self.__get_result()
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _
>>>>
>>>> self =
>>>>
>>>> def __get_result(self):
>>>> if self._exception:
>>>> > raise self._exception
>>>> E ValueError: Input contains NaN, infinity or a value too
>>>> large for dtype('float32').
>>>>
>>>> /usr/lib/python3.6/concurrent/futures/_base.py:384: ValueError
>>>>
>>>>
>>>> The above exception is raised about 12 to 14 times in 100 in fullblown
>>>> automated testing.
>>>>
>>>> Thanks for the cool software.
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>>
 next part 
An HTML attachment was scrubbed...
URL:
From joel.nothman at gmail.com Sat Jan 18 08:09:46 2020
From: joel.nothman at gmail.com (Joel Nothman)
Date: Sun, 19 Jan 2020 00:09:46 +1100
Subject: [scikitlearn] ANN: DIPY 1.1.1  a powerful release
InReplyTo:
References:
MessageID:
If the Scikitlearn mailing list is going to include announcements of
related package releases, could we please get a line or two describing that
package? I expect most readers here don't know of DIPY, or of its relevance
to Scikitlearn users. (I'm still not sure why it's generally relevant to
scikitlearn users.)
Thanks
On Fri, 17 Jan 2020 at 04:04, Eleftherios Garyfallidis <
garyfallidis at gmail.com> wrote:
> We are excited to announce a new release of DIPY:
>
>
> DIPY 1.1.1 is out! In addition:
>
>
> a) A new 5 day workshop available during March 1620 to learn the theory
> and applications of the hundreds of methods available in DIPY 1.1.1
> Intense!
>
> See the exquisite program here .
>
> *b) Given the need for a myriad of new DIPY derivative projects, DIPY
> moved to its own organization in GitHub. **Long live DIPY! *
> *And therefore, *https://github.com/dipy/dipy* supersedes https://github.com/nipy/dipy
> The old link will be available as a redirect
> link for the next 6 months.*
>
> c) Please support us by *citing** DIPY* in your papers using the
> following DOI: 10.3389/fninf.2014.00008
>