From Afarin.Famili at UTSouthwestern.edu Fri Feb 3 15:53:54 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Fri, 3 Feb 2017 20:53:54 +0000
Subject: [scikit-learn] Calculate p-value,
the measure of statistical significance, in scikit-learn
Message-ID: <1486155234925.50514@UTSouthwestern.edu>
Hi all,
I am aiming at calculating the p-value of regression models using scikit-learn, in order to report their statistical significance. Aside from permutation_test_score in scikit-learn, do you have any suggestions for calculating the p-value of the model? Ultimately, I am interested in computing the coefficient of determination, r2 as well as MSE to indicate the performance of the model for those models that were statistically significant.
Thank you,
Afarin?
?
________________________________
UT Southwestern
Medical Center
The future of medicine, today.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jakevdp at cs.washington.edu Fri Feb 3 16:51:07 2017
From: jakevdp at cs.washington.edu (Jacob Vanderplas)
Date: Fri, 3 Feb 2017 13:51:07 -0800
Subject: [scikit-learn] Calculate p-value,
the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID:
Hi Afarin,
The short answer is no, you can't really compute p-values and related
statistics in Scikit-Learn.
This stems from a fundamental divide in statistics/AI between machine
learning on one hand, and statistical modeling on the other. A classic
treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo
Breiman.
In short, statistical modeling is about *estimating parameters of models*,
and in that context things like significance, p-values, etc. are relevant.
Machine learning is about *predicting outputs*, and generally treats models
and their parameters as a black box, the contents of which are not of any
explicit interest. As such, p-values and related statistics concerning
model parameters are not a concern.
Scikit-learn is firmly in the latter camp of Machine learning. Of course,
there is plenty of overlap between the two cultures, and the divide is
somewhat fuzzy in practice, but it's a useful way to frame the issue. If
you're interested in statistical modeling rather than machine learning (and
it sounds like you are), scikit-learn is not really the right tool. You
might check out the statsmodels
package,
Jake
Jake VanderPlas
Senior Data Science Fellow
Director of Research in Physical Sciences
University of Washington eScience Institute
On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:
> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From michael.eickenberg at gmail.com Fri Feb 3 16:54:14 2017
From: michael.eickenberg at gmail.com (Michael Eickenberg)
Date: Fri, 3 Feb 2017 22:54:14 +0100
Subject: [scikit-learn] Calculate p-value,
the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID:
Dear Afarin,
scikit-learn is designed for predictive modelling, where evaluation is done
out of sample (using train and test sets).
You seem to be looking for a package with which you can do classical
in-sample statistics and their corresponding evaluations among which
p-values. You are probably better off using statsmodels for that or R
directly if you don't mind changing languages.
Hope that helps!
Michael
On Friday, 3 February 2017, Afarin Famili
wrote:
> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From stuart at stuartreynolds.net Fri Feb 3 17:47:47 2017
From: stuart at stuartreynolds.net (Stuart Reynolds)
Date: Fri, 3 Feb 2017 14:47:47 -0800
Subject: [scikit-learn] Calculate p-value,
the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID:
The statsmodels package may have more of this kind of thing.
http://statsmodels.sourceforge.net/devel/glm.html
http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue
I assume you're talking about pvalues for a model's parameters, not on the
models performance.
For the latter, there's various basic stats functions.
On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:
> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Afarin.Famili at UTSouthwestern.edu Fri Feb 3 18:32:23 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Fri, 3 Feb 2017 23:32:23 +0000
Subject: [scikit-learn] Does permutation_test_score not output the p_value
for statistical significance of the model? Re: scikit-learn Digest, Vol 11,
Issue 2
In-Reply-To:
References:
Message-ID: <1486164743283.49517@UTSouthwestern.edu>
Thank you all for your answers. I am interested in the statistical significance of the model and not the parameters of the model. I thought "permutation_test_score" from scikit-learn and the p_value it returns, work for the purpose of my work. Am I wrong though? Is this function only used for measuring the statistical significance of classifiers and not regression models?
Kind regards,
Afarin
________________________________________
From: scikit-learn on behalf of scikit-learn-request at python.org
Sent: Friday, February 3, 2017 4:47 PM
To: scikit-learn at python.org
Subject: scikit-learn Digest, Vol 11, Issue 2
Send scikit-learn mailing list submissions to
scikit-learn at python.org
To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
scikit-learn-request at python.org
You can reach the person managing the list at
scikit-learn-owner at python.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."
Today's Topics:
1. Calculate p-value, the measure of statistical significance,
in scikit-learn (Afarin Famili)
2. Re: Calculate p-value, the measure of statistical
significance, in scikit-learn (Jacob Vanderplas)
3. Re: Calculate p-value, the measure of statistical
significance, in scikit-learn (Michael Eickenberg)
4. Re: Calculate p-value, the measure of statistical
significance, in scikit-learn (Stuart Reynolds)
----------------------------------------------------------------------
Message: 1
Date: Fri, 3 Feb 2017 20:53:54 +0000
From: Afarin Famili
To: "scikit-learn at python.org"
Subject: [scikit-learn] Calculate p-value, the measure of statistical
significance, in scikit-learn
Message-ID: <1486155234925.50514 at UTSouthwestern.edu>
Content-Type: text/plain; charset="iso-8859-1"
Hi all,
I am aiming at calculating the p-value of regression models using scikit-learn, in order to report their statistical significance. Aside from permutation_test_score in scikit-learn, do you have any suggestions for calculating the p-value of the model? Ultimately, I am interested in computing the coefficient of determination, r2 as well as MSE to indicate the performance of the model for those models that were statistically significant.
Thank you,
Afarin?
?
________________________________
UT Southwestern
Medical Center
The future of medicine, today.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
------------------------------
Message: 2
Date: Fri, 3 Feb 2017 13:51:07 -0800
From: Jacob Vanderplas
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] Calculate p-value, the measure of
statistical significance, in scikit-learn
Message-ID:
Content-Type: text/plain; charset="utf-8"
Hi Afarin,
The short answer is no, you can't really compute p-values and related
statistics in Scikit-Learn.
This stems from a fundamental divide in statistics/AI between machine
learning on one hand, and statistical modeling on the other. A classic
treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo
Breiman.
In short, statistical modeling is about *estimating parameters of models*,
and in that context things like significance, p-values, etc. are relevant.
Machine learning is about *predicting outputs*, and generally treats models
and their parameters as a black box, the contents of which are not of any
explicit interest. As such, p-values and related statistics concerning
model parameters are not a concern.
Scikit-learn is firmly in the latter camp of Machine learning. Of course,
there is plenty of overlap between the two cultures, and the divide is
somewhat fuzzy in practice, but it's a useful way to frame the issue. If
you're interested in statistical modeling rather than machine learning (and
it sounds like you are), scikit-learn is not really the right tool. You
might check out the statsmodels
package,
Jake
Jake VanderPlas
Senior Data Science Fellow
Director of Research in Physical Sciences
University of Washington eScience Institute
On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:
> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
------------------------------
Message: 3
Date: Fri, 3 Feb 2017 22:54:14 +0100
From: Michael Eickenberg
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] Calculate p-value, the measure of
statistical significance, in scikit-learn
Message-ID:
Content-Type: text/plain; charset="utf-8"
Dear Afarin,
scikit-learn is designed for predictive modelling, where evaluation is done
out of sample (using train and test sets).
You seem to be looking for a package with which you can do classical
in-sample statistics and their corresponding evaluations among which
p-values. You are probably better off using statsmodels for that or R
directly if you don't mind changing languages.
Hope that helps!
Michael
On Friday, 3 February 2017, Afarin Famili
wrote:
> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
------------------------------
Message: 4
Date: Fri, 3 Feb 2017 14:47:47 -0800
From: Stuart Reynolds
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] Calculate p-value, the measure of
statistical significance, in scikit-learn
Message-ID:
Content-Type: text/plain; charset="utf-8"
The statsmodels package may have more of this kind of thing.
http://statsmodels.sourceforge.net/devel/glm.html
http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue
I assume you're talking about pvalues for a model's parameters, not on the
models performance.
For the latter, there's various basic stats functions.
On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:
> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
------------------------------
Subject: Digest Footer
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
------------------------------
End of scikit-learn Digest, Vol 11, Issue 2
*******************************************
From raga.markely at gmail.com Fri Feb 3 23:18:39 2017
From: raga.markely at gmail.com (Raga Markely)
Date: Fri, 3 Feb 2017 23:18:39 -0500
Subject: [scikit-learn] Linear Discriminant Analysis - The priors do not sum
to 1. Renormalizing"
Message-ID:
Hello,
I ran LDA for dimensionality reduction, and got the following message on
the command prompt (not on the Jupyter Notebook):
"The priors do not sum to 1. Renormalizing", UserWarning
If I understand correctly, the prior = sum of y bincount/ len(y)? So, does
it mean I am getting this message due to some rounding errors? I wonder how
I can check if I make any mistake somewhere?
Thank you,
Raga
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From raga.markely at gmail.com Fri Feb 3 23:36:50 2017
From: raga.markely at gmail.com (Raga Markely)
Date: Fri, 3 Feb 2017 23:36:50 -0500
Subject: [scikit-learn] PC Desktop requirement for Machine Learning
Message-ID:
Hello,
I am planning to buy office PC desktop for machine learning work. I wonder
if you could provide some recommendation on the computer specs and brand? I
don't need cloud capacity, just a standalone, but powerful desktop.. to
simplify, let's ignore the price.. i can scale down according to budget as
appropriate later..
Just to give a rough ballpark, I ran repeated nested loop (50 outer repeats
x 50 inner repeats, ~35 data points, <10 features) with different
classification algorithms (Logistic Regressions, KNN, SVC, Kernel SVC,
Random Forest) on lightweight office laptop, and as expected, it took a
very long time to complete (it finished during the time I left overnight).
I would like to be able to complete this in a few mins or less maybe? :D..
so that I can quickly assess and modify the code as necessary .. In the
long run, I will also need to do regressions and may use larger data sets
(up to 10^4 data points order of magnitude)...
I guess this is a very vague question, but I will take any tips and
suggestions.
Thank you!
Raga
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ahowe42 at gmail.com Sat Feb 4 03:23:33 2017
From: ahowe42 at gmail.com (Andrew Howe)
Date: Sat, 4 Feb 2017 11:23:33 +0300
Subject: [scikit-learn] Calculate p-value,
the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID:
I'm fairly certain that the scikit-learn regression result, plus what you
already have about the data is enough for you to compute all those
statistical measures yourself. It should be rather trivial to do so.
Andrew
On Feb 4, 2017 00:34, "Afarin Famili"
wrote:
> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From alekhka at gmail.com Sat Feb 4 07:45:54 2017
From: alekhka at gmail.com (Alekh Karkada Ashok)
Date: Sat, 4 Feb 2017 18:15:54 +0530
Subject: [scikit-learn] 10 years of Scikit-learn
Message-ID:
Hi all!
2017 marks the 10th year of Scikit-learn (started as a GSoC project in
2007). Can we do anything to celebrate? Perhaps a sticker on the website?
or T-shirts commemorating this?
Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From nelle.varoquaux at gmail.com Sat Feb 4 14:52:05 2017
From: nelle.varoquaux at gmail.com (Nelle Varoquaux)
Date: Sat, 4 Feb 2017 11:52:05 -0800
Subject: [scikit-learn] Calculate p-value,
the measure of statistical significance, in scikit-learn
In-Reply-To:
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID:
> I'm fairly certain that the scikit-learn regression result, plus what you
> already have about the data is enough for you to compute all those
> statistical measures yourself. It should be rather trivial to do so.
>
That is highly dependent on the regression model you use. For example
computing a p-value for a lasso regression parameter is not so trivial,
though a significance test has recently been proposed.
>
> Andrew
>
> On Feb 4, 2017 00:34, "Afarin Famili"
> wrote:
>
>> Hi all,
>>
>> I am aiming at calculating the p-value of regression models using
>> scikit-learn, in order to report their statistical significance. Aside from
>> permutation_test_score in scikit-learn, do you have any suggestions for
>> calculating the p-value of the model? Ultimately, I am interested in
>> computing the coefficient of determination, r2 as well as MSE to indicate
>> the performance of the model for those models that were statistically
>> significant.
>>
>> Thank you,
>>
>> Afarin?
>>
>> ?
>>
>>
>>
>> ------------------------------
>>
>> UT Southwestern
>>
>> Medical Center
>>
>> The future of medicine, today.
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From gael.varoquaux at normalesup.org Sat Feb 4 16:39:47 2017
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sat, 4 Feb 2017 22:39:47 +0100
Subject: [scikit-learn] 10 years of Scikit-learn
In-Reply-To:
References:
Message-ID: <20170204213947.GE1858410@phare.normalesup.org>
Indeed, that a good point.
We should mention it in our talks, and maybe in the release notes of next
release.
Ga?l
On Sat, Feb 04, 2017 at 06:15:54PM +0530, Alekh Karkada Ashok wrote:
> Hi all!
> 2017 marks the 10th year of Scikit-learn (started as a GSoC project in 2007).
> Can we do anything to celebrate? Perhaps a sticker on the website? or T-shirts
> commemorating this?
> Thank you!
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
--
Gael Varoquaux
Researcher, INRIA Parietal
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
From Afarin.Famili at UTSouthwestern.edu Sat Feb 4 18:43:36 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Sat, 4 Feb 2017 23:43:36 +0000
Subject: [scikit-learn] Permutation-test-score
Message-ID: <1486251816290.82720@UTSouthwestern.edu>
Hi,
Can anyone please tell me what does "permutation_test_score"(and the p_value it returns) do in scikit-learn? I am assuming it outputs the statistical significance of the performance of regression models. I am planning on comparing the performance of various regression models if the performance measure they are reporting is statistically significant. To this end, I wanna output the p-value of the prediction first, and if it was smaller than a certain cut-off, I would then report the performance metrics, such as r2 and MSE.
Do p-value and score outputs from "permutation-test-score" not provide me with what I want?
Afarin
________________________________
UT Southwestern
Medical Center
The future of medicine, today.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ahowe42 at gmail.com Sun Feb 5 00:15:18 2017
From: ahowe42 at gmail.com (Andrew Howe)
Date: Sun, 5 Feb 2017 08:15:18 +0300
Subject: [scikit-learn] Calculate p-value,
the measure of statistical significance, in scikit-learn
In-Reply-To:
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID:
Yep - in which case the OP would have difficulty computing p-values (but
not the other usual stats) with any software tool that provided those
methods. But since the question was specifically about scikit-learn, my
main point is that the quantities are easy to compute (if they exist).
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
www.andrewhowe.com
http://www.linkedin.com/in/ahowe42
https://www.researchgate.net/profile/John_Howe12/
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Sat, Feb 4, 2017 at 10:52 PM, Nelle Varoquaux
wrote:
>
> I'm fairly certain that the scikit-learn regression result, plus what you
>> already have about the data is enough for you to compute all those
>> statistical measures yourself. It should be rather trivial to do so.
>>
>
> That is highly dependent on the regression model you use. For example
> computing a p-value for a lasso regression parameter is not so trivial,
> though a significance test has recently been proposed.
>
>
>>
>> Andrew
>>
>> On Feb 4, 2017 00:34, "Afarin Famili"
>> wrote:
>>
>>> Hi all,
>>>
>>> I am aiming at calculating the p-value of regression models using
>>> scikit-learn, in order to report their statistical significance. Aside from
>>> permutation_test_score in scikit-learn, do you have any suggestions for
>>> calculating the p-value of the model? Ultimately, I am interested in
>>> computing the coefficient of determination, r2 as well as MSE to indicate
>>> the performance of the model for those models that were statistically
>>> significant.
>>>
>>> Thank you,
>>>
>>> Afarin?
>>>
>>> ?
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> UT Southwestern
>>>
>>> Medical Center
>>>
>>> The future of medicine, today.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From olivier.grisel at ensta.org Sun Feb 5 04:44:01 2017
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Sun, 5 Feb 2017 10:44:01 +0100
Subject: [scikit-learn] Permutation-test-score
In-Reply-To: <1486251816290.82720@UTSouthwestern.edu>
References: <1486251816290.82720@UTSouthwestern.edu>
Message-ID:
This is non-parametric (aka brute force) way to check that a model has a
predictive performance significantly higher than chance. For models with
90% accuracy this is useless as we already know for sure that the model is
better than predicting at random. This method is only useful if you have
very little data or very noisy data and you are not even sure that your
predictive method is able to pick anything predictive from the data. E.g.
you have a balanced binary classification problem with ~52% accuracy.
It proceeds as follows: it first does a single cross-validation round with
the true label to compute a reference score. Then it does the same 100
times but each time with independently randomly permuted variants of the
labels (the y array). Then it returns the fraction of the time the
reference CV score was higher than the CV scores of the models trained and
evaluated with permuted labels.
Here is an example:
http://scikit-learn.org/stable/auto_examples/feature_selection/plot_permutation_test_for_classification.html
Note that you should not use than method to select the best model from a
collection of possible models and then report its permutation test p-value
without correcting for multiple comparisons.
--
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From nixnmtm at gmail.com Tue Feb 7 09:26:09 2017
From: nixnmtm at gmail.com (Nixon Raj)
Date: Tue, 7 Feb 2017 22:26:09 +0800
Subject: [scikit-learn] Need Corresponding indices array of values in each
split of a DesicisionTreeClassifier
Message-ID:
For Example, In the below decision tree dot file, I have 223 samples which
splits into [174, 49] in the first split and [110, 1] in the 2nd split
I would like to get the array of indices for the values of each split like
*[174, 49] and their corresponding indices (idx) like [[0, 1 ,5,
7,....,200,221], [3, 4, 6, ....., 199,222,223]]*
*[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
[7]]*
Please help me
node [shape=box] ;
0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174,
49]"] ;
1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
1]"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
1 -> 2 ;
3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
1 -> 3 ;
4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
48]"] ;
0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
15]"] ;
4 -> 5 ;
6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
5 -> 6 ;
7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46, 6]"]
;
5 -> 7 ;
8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
7 -> 8 ;
9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28, 6]"]
;
7 -> 9 ;
10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
9 -> 10 ;
11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28, 5]"]
;
9 -> 11 ;
12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
11 -> 12 ;
13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
5]"] ;
11 -> 13 ;
14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
13 -> 14 ;
15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
13 -> 15 ;
16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
33]"] ;
4 -> 16 ;
17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
16 -> 17 ;
18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
31]"] ;
16 -> 18 ;
19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
31]"] ;
18 -> 19 ;
20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
26]"] ;
19 -> 20 ;
21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
19]"] ;
20 -> 21 ;
22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
21 -> 22 ;
23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
21 -> 23 ;
24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
20 -> 24 ;
25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
19 -> 25 ;
26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
18 -> 26 ;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From joel.nothman at gmail.com Tue Feb 7 18:21:16 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Wed, 8 Feb 2017 10:21:16 +1100
Subject: [scikit-learn] Need Corresponding indices array of values in
each split of a DesicisionTreeClassifier
In-Reply-To:
References:
Message-ID:
I don't think putting that array of indices in a visualisation is a great
idea!
If you use my_tree.apply(X) you will be able to determine which leaf each
instance in X lands up at, and potentially trace up the tree from there.
On 8 February 2017 at 01:26, Nixon Raj wrote:
>
> For Example, In the below decision tree dot file, I have 223 samples which
> splits into [174, 49] in the first split and [110, 1] in the 2nd split
>
> I would like to get the array of indices for the values of each split like
>
> *[174, 49] and their corresponding indices (idx) like [[0, 1 ,5,
> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]*
>
> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
> [7]]*
>
> Please help me
>
> node [shape=box] ;
> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174,
> 49]"] ;
> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
> 1]"] ;
> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
> 1 -> 2 ;
> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
> 1 -> 3 ;
> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
> 48]"] ;
> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
> 15]"] ;
> 4 -> 5 ;
> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
> 5 -> 6 ;
> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46,
> 6]"] ;
> 5 -> 7 ;
> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
> 7 -> 8 ;
> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28,
> 6]"] ;
> 7 -> 9 ;
> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
> 9 -> 10 ;
> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28,
> 5]"] ;
> 9 -> 11 ;
> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
> 11 -> 12 ;
> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
> 5]"] ;
> 11 -> 13 ;
> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
> 13 -> 14 ;
> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
> 13 -> 15 ;
> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
> 33]"] ;
> 4 -> 16 ;
> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
> 16 -> 17 ;
> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
> 31]"] ;
> 16 -> 18 ;
> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
> 31]"] ;
> 18 -> 19 ;
> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
> 26]"] ;
> 19 -> 20 ;
> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
> 19]"] ;
> 20 -> 21 ;
> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
> 21 -> 22 ;
> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
> 21 -> 23 ;
> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
> 20 -> 24 ;
> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
> 19 -> 25 ;
> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
> 18 -> 26 ;
> }
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jblackburne at gmail.com Tue Feb 7 19:13:40 2017
From: jblackburne at gmail.com (Jeff Blackburne)
Date: Tue, 7 Feb 2017 16:13:40 -0800
Subject: [scikit-learn] Need Corresponding indices array of values in
each split of a DesicisionTreeClassifier
In-Reply-To:
References:
Message-ID:
Nixon,
If you are using version 0.18 or later, you can reconstruct the information
you need using the `decision_path` method:
http://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html
-Jeff
On Tue, Feb 7, 2017 at 3:21 PM, Joel Nothman wrote:
> I don't think putting that array of indices in a visualisation is a great
> idea!
>
> If you use my_tree.apply(X) you will be able to determine which leaf each
> instance in X lands up at, and potentially trace up the tree from there.
>
> On 8 February 2017 at 01:26, Nixon Raj wrote:
>
>>
>> For Example, In the below decision tree dot file, I have 223 samples
>> which splits into [174, 49] in the first split and [110, 1] in the 2nd split
>>
>> I would like to get the array of indices for the values of each split
>> like
>>
>> *[174, 49] and their corresponding indices (idx) like [[0, 1 ,5,
>> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]*
>>
>> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
>> [7]]*
>>
>> Please help me
>>
>> node [shape=box] ;
>> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174,
>> 49]"] ;
>> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
>> 1]"] ;
>> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
>> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
>> 1 -> 2 ;
>> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
>> 1 -> 3 ;
>> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
>> 48]"] ;
>> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
>> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
>> 15]"] ;
>> 4 -> 5 ;
>> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
>> 5 -> 6 ;
>> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46,
>> 6]"] ;
>> 5 -> 7 ;
>> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
>> 7 -> 8 ;
>> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28,
>> 6]"] ;
>> 7 -> 9 ;
>> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
>> 9 -> 10 ;
>> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28,
>> 5]"] ;
>> 9 -> 11 ;
>> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
>> 11 -> 12 ;
>> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
>> 5]"] ;
>> 11 -> 13 ;
>> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
>> 13 -> 14 ;
>> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
>> 13 -> 15 ;
>> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
>> 33]"] ;
>> 4 -> 16 ;
>> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
>> 16 -> 17 ;
>> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
>> 31]"] ;
>> 16 -> 18 ;
>> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
>> 31]"] ;
>> 18 -> 19 ;
>> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
>> 26]"] ;
>> 19 -> 20 ;
>> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
>> 19]"] ;
>> 20 -> 21 ;
>> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
>> 21 -> 22 ;
>> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
>> 21 -> 23 ;
>> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
>> 20 -> 24 ;
>> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
>> 19 -> 25 ;
>> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
>> 18 -> 26 ;
>> }
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From joel.nothman at gmail.com Tue Feb 7 21:00:12 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Tue, 7 Feb 2017 21:00:12 -0500
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To: <20170111215115.GO1585067@phare.normalesup.org>
References:
<20170109151546.GM2802991@phare.normalesup.org>
<20170111215115.GO1585067@phare.normalesup.org>
Message-ID:
On 12 January 2017 at 08:51, Gael Varoquaux
wrote:
> On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
> > When the two versions deprecation policy was instituted, releases were
> much
> > more frequent... Is that enough of an excuse?
>
> I'd rather say that we can here decide that we are giving a longer grace
> period.
>
> I think that slow deprecations are a good things (see titus's blog post
> here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )
>
Given that 0.18 was a very slow release, and the work for removing
deprecated material from 0.19 has already been done, I don't think we
should revert that. I agree that we can delay the deprecation deadline for
0.20 and 0.21.
In terms of release schedule, are we aiming for RC in early-mid March,
assuming Andy's above prognostications are correct and he is able to review
in a bigger way in a week or so?
J
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From nixnmtm at gmail.com Wed Feb 8 04:43:17 2017
From: nixnmtm at gmail.com (Nixon Raj)
Date: Wed, 8 Feb 2017 17:43:17 +0800
Subject: [scikit-learn] Need Corresponding indices array of values in
each split of a DesicisionTreeClassifier
In-Reply-To:
References:
Message-ID:
Hi Joel andJeff
Thanks for your valuable comment, i got that to work
On 8 February 2017 at 08:13, Jeff Blackburne wrote:
> Nixon,
>
> If you are using version 0.18 or later, you can reconstruct the
> information you need using the `decision_path` method:
>
> http://scikit-learn.org/stable/auto_examples/tree/
> plot_unveil_tree_structure.html
>
> -Jeff
>
>
> On Tue, Feb 7, 2017 at 3:21 PM, Joel Nothman
> wrote:
>
>> I don't think putting that array of indices in a visualisation is a great
>> idea!
>>
>> If you use my_tree.apply(X) you will be able to determine which leaf each
>> instance in X lands up at, and potentially trace up the tree from there.
>>
>> On 8 February 2017 at 01:26, Nixon Raj wrote:
>>
>>>
>>> For Example, In the below decision tree dot file, I have 223 samples
>>> which splits into [174, 49] in the first split and [110, 1] in the 2nd split
>>>
>>> I would like to get the array of indices for the values of each split
>>> like
>>>
>>> *[174, 49] and their corresponding indices (idx) like [[0, 1 ,5,
>>> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]*
>>>
>>> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
>>> [7]]*
>>>
>>> Please help me
>>>
>>> node [shape=box] ;
>>> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue =
>>> [174, 49]"] ;
>>> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
>>> 1]"] ;
>>> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
>>> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
>>> 1 -> 2 ;
>>> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
>>> 1 -> 3 ;
>>> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
>>> 48]"] ;
>>> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
>>> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
>>> 15]"] ;
>>> 4 -> 5 ;
>>> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
>>> 5 -> 6 ;
>>> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46,
>>> 6]"] ;
>>> 5 -> 7 ;
>>> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
>>> 7 -> 8 ;
>>> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28,
>>> 6]"] ;
>>> 7 -> 9 ;
>>> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
>>> 9 -> 10 ;
>>> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28,
>>> 5]"] ;
>>> 9 -> 11 ;
>>> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
>>> 11 -> 12 ;
>>> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
>>> 5]"] ;
>>> 11 -> 13 ;
>>> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
>>> 13 -> 14 ;
>>> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
>>> 13 -> 15 ;
>>> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
>>> 33]"] ;
>>> 4 -> 16 ;
>>> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
>>> 16 -> 17 ;
>>> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
>>> 31]"] ;
>>> 16 -> 18 ;
>>> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
>>> 31]"] ;
>>> 18 -> 19 ;
>>> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
>>> 26]"] ;
>>> 19 -> 20 ;
>>> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
>>> 19]"] ;
>>> 20 -> 21 ;
>>> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
>>> 21 -> 22 ;
>>> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
>>> 21 -> 23 ;
>>> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
>>> 20 -> 24 ;
>>> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
>>> 19 -> 25 ;
>>> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
>>> 18 -> 26 ;
>>> }
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
--
Regards
Nixon Raj N
Department of Biological Science and Technology
Institute of Bioinformatics and Systems Biology
National Chiao Tung University
208 Lab Building 1, 75 Bo-Ai St.
Dong District, Hsinchu, Taiwan 30062
(R.O.C.)
Mob:+886-989353921
0ffice ext: 56997
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ahowe42 at gmail.com Wed Feb 8 12:15:44 2017
From: ahowe42 at gmail.com (Andrew Howe)
Date: Wed, 8 Feb 2017 20:15:44 +0300
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To:
References:
<20170109151546.GM2802991@phare.normalesup.org>
<20170111215115.GO1585067@phare.normalesup.org>
Message-ID:
How many current deprecations are expected in the next release?
Andrew
On Jan 12, 2017 00:53, "Gael Varoquaux"
wrote:
On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
> When the two versions deprecation policy was instituted, releases were
much
> more frequent... Is that enough of an excuse?
I'd rather say that we can here decide that we are giving a longer grace
period.
I think that slow deprecations are a good things (see titus's blog post
here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )
G
> On 12 January 2017 at 03:43, Andreas Mueller wrote:
> On 01/09/2017 10:15 AM, Gael Varoquaux wrote:
> instead of setting up a roadmap I would rather just identify
bugs
> that
> are blockers and fix only those and don't wait for any feature
> before
> cutting 0.19.X.
> I agree with the sentiment, but this would mess with our deprecation
cycle.
> If we release now, and then release again soonish, that means people
have
> less calendar time
> to react to deprecations.
> We could either accept this or change all deprecations and bump the
removal
> by a version?
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
--
Gael Varoquaux
Researcher, INRIA Parietal
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From joel.nothman at gmail.com Wed Feb 8 22:30:40 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Thu, 9 Feb 2017 14:30:40 +1100
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To:
References:
<20170109151546.GM2802991@phare.normalesup.org>
<20170111215115.GO1585067@phare.normalesup.org>
Message-ID:
Not sure that this quite gives you a number, but:
$git checkout 0.18.1
$ git grep -pwB1 0.19 sklearn | grep -ve ^- -e .csv: -e /tests/ >
/tmp/dep19.txt
etc.
edited results attached.
On 9 February 2017 at 04:15, Andrew Howe wrote:
> How many current deprecations are expected in the next release?
>
> Andrew
>
> On Jan 12, 2017 00:53, "Gael Varoquaux"
> wrote:
>
> On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
> > When the two versions deprecation policy was instituted, releases were
> much
> > more frequent... Is that enough of an excuse?
>
> I'd rather say that we can here decide that we are giving a longer grace
> period.
>
> I think that slow deprecations are a good things (see titus's blog post
> here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )
>
> G
>
> > On 12 January 2017 at 03:43, Andreas Mueller wrote:
>
>
>
> > On 01/09/2017 10:15 AM, Gael Varoquaux wrote:
>
> > instead of setting up a roadmap I would rather just identify
> bugs
> > that
> > are blockers and fix only those and don't wait for any
> feature
> > before
> > cutting 0.19.X.
>
>
>
> > I agree with the sentiment, but this would mess with our deprecation
> cycle.
> > If we release now, and then release again soonish, that means people
> have
> > less calendar time
> > to react to deprecations.
>
> > We could either accept this or change all deprecations and bump the
> removal
> > by a version?
>
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> --
> Gael Varoquaux
> Researcher, INRIA Parietal
> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> Phone: ++ 33-1-69-08-79-68
> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
sklearn/base.py=from . import __version__
sklearn/base.py- at deprecated("ChangedBehaviorWarning has been moved into the sklearn.exceptions"
sklearn/base.py: " module. It will not be available here from version 0.19")
sklearn/datasets/data/boston_house_prices.csv-1.62864,0,21.89,0,0.624,5.019,100,1.4394,4,437,21.2,396.9,34.41,14.4
sklearn/datasets/data/boston_house_prices.csv-0.40202,0,9.9,0,0.544,6.382,67.2,3.5325,4,304,18.4,395.21,10.36,23.1
sklearn/datasets/data/breast_cancer.csv-14.71,21.59,95.55,656.9,0.1137,0.1365,0.1293,0.08123,0.2027,0.06758,0.4226,1.15,2.735,40.09,0.003659,0.02855,0.02572,0.01272,0.01817,0.004108,17.87,30.7,115.7,985.5,0.1368,0.429,0.3587,0.1834,0.3698,0.1094,0
sklearn/datasets/data/breast_cancer.csv-20.26,23.03,132.4,1264,0.09078,0.1313,0.1465,0.08683,0.2095,0.05649,0.7576,1.509,4.554,87.87,0.006016,0.03482,0.04232,0.01269,0.02657,0.004411,24.22,31.59,156.1,1750,0.119,0.3539,0.4098,0.1573,0.3689,0.08368,0
sklearn/datasets/data/breast_cancer.csv-12.86,13.32,82.82,504.8,0.1134,0.08834,0.038,0.034,0.1543,0.06476,0.2212,1.042,1.614,16.57,0.00591,0.02016,0.01902,0.01011,0.01202,0.003107,14.04,21.08,92.8,599.5,0.1547,0.2231,0.1791,0.1155,0.2382,0.08553,1
sklearn/datasets/data/breast_cancer.csv-11.87,21.54,76.83,432,0.06613,0.1064,0.08777,0.02386,0.1349,0.06612,0.256,1.554,1.955,20.24,0.006854,0.06063,0.06663,0.01553,0.02354,0.008925,12.79,28.18,83.51,507.2,0.09457,0.3399,0.3218,0.0875,0.2305,0.09952,1
sklearn/datasets/data/breast_cancer.csv-13,25.13,82.61,520.2,0.08369,0.05073,0.01206,0.01762,0.1667,0.05449,0.2621,1.232,1.657,21.19,0.006054,0.008974,0.005681,0.006336,0.01215,0.001514,14.34,31.88,91.06,628.5,0.1218,0.1093,0.04462,0.05921,0.2306,0.06291,1
sklearn/datasets/lfw.py=def _fetch_lfw_pairs(index_file_path, data_folder_path, slice_=None,
sklearn/datasets/lfw.py- at deprecated("Function 'load_lfw_people' has been deprecated in 0.17 and will "
sklearn/datasets/lfw.py: "be removed in 0.19."
sklearn/datasets/lfw.py=def load_lfw_people(download_if_missing=False, **kwargs):
sklearn/datasets/lfw.py- .. deprecated:: 0.17
sklearn/datasets/lfw.py: This function will be removed in 0.19.
sklearn/datasets/lfw.py=def fetch_lfw_pairs(subset='train', data_home=None, funneled=True, resize=0.5,
sklearn/datasets/lfw.py- at deprecated("Function 'load_lfw_pairs' has been deprecated in 0.17 and will "
sklearn/datasets/lfw.py: "be removed in 0.19."
sklearn/datasets/lfw.py=def load_lfw_pairs(download_if_missing=False, **kwargs):
sklearn/datasets/lfw.py- .. deprecated:: 0.17
sklearn/datasets/lfw.py: This function will be removed in 0.19.
sklearn/decomposition/nmf.py=def non_negative_factorization(X, W=None, H=None, n_components=None,
sklearn/decomposition/nmf.py- if solver == 'pg':
sklearn/decomposition/nmf.py: warnings.warn("'pg' solver will be removed in release 0.19."
sklearn/decomposition/nmf.py=class NMF(BaseEstimator, TransformerMixin):
sklearn/decomposition/nmf.py- " for 'pg' solver, which will be removed"
sklearn/decomposition/nmf.py: " in release 0.19. Use another solver with L1 or L2"
sklearn/decomposition/nmf.py-
sklearn/decomposition/nmf.py:@deprecated("It will be removed in release 0.19. Use NMF instead."
sklearn/decomposition/nmf.py: "'pg' solver is still available until release 0.19.")
sklearn/discriminant_analysis.py=class LinearDiscriminantAnalysis(BaseEstimator, LinearClassifierMixin,
sklearn/discriminant_analysis.py- warnings.warn("The parameter 'store_covariance' is deprecated as "
sklearn/discriminant_analysis.py: "of version 0.17 and will be removed in 0.19. The "
sklearn/discriminant_analysis.py- warnings.warn("The parameter 'tol' is deprecated as of version "
sklearn/discriminant_analysis.py: "0.17 and will be removed in 0.19. The parameter is "
sklearn/discriminant_analysis.py=class QuadraticDiscriminantAnalysis(BaseEstimator, ClassifierMixin):
sklearn/discriminant_analysis.py- warnings.warn("The parameter 'store_covariances' is deprecated as "
sklearn/discriminant_analysis.py: "of version 0.17 and will be removed in 0.19. The "
sklearn/discriminant_analysis.py- warnings.warn("The parameter 'tol' is deprecated as of version "
sklearn/discriminant_analysis.py: "0.17 and will be removed in 0.19. The parameter is "
sklearn/ensemble/forest.py=class ForestClassifier(six.with_metaclass(ABCMeta, BaseForest,
sklearn/ensemble/forest.py- warn("class_weight='subsample' is deprecated in 0.17 and"
sklearn/ensemble/forest.py: "will be removed in 0.19. It was replaced by "
sklearn/ensemble/gradient_boosting.py=class BaseGradientBoosting(six.with_metaclass(ABCMeta, BaseEnsemble,
sklearn/ensemble/gradient_boosting.py-
sklearn/ensemble/gradient_boosting.py: @deprecated(" and will be removed in 0.19")
sklearn/ensemble/gradient_boosting.py-
sklearn/ensemble/gradient_boosting.py: @deprecated(" and will be removed in 0.19")
sklearn/feature_selection/from_model.py=class _LearntSelectorMixin(TransformerMixin):
sklearn/feature_selection/from_model.py- @deprecated('Support to use estimators as feature selectors will be '
sklearn/feature_selection/from_model.py: 'removed in version 0.19. Use SelectFromModel instead.')
sklearn/lda.py=warnings.warn("lda.LDA has been moved to "
sklearn/lda.py- "discriminant_analysis.LinearDiscriminantAnalysis "
sklearn/lda.py: "in 0.17 and will be removed in 0.19", DeprecationWarning)
sklearn/lda.py=class LDA(_LDA):
sklearn/lda.py- .. deprecated:: 0.17
sklearn/lda.py: This class will be removed in 0.19.
sklearn/linear_model/base.py=class LinearModel(six.with_metaclass(ABCMeta, BaseEstimator)):
sklearn/linear_model/base.py-
sklearn/linear_model/base.py: @deprecated(" and will be removed in 0.19.")
sklearn/linear_model/base.py=class LinearRegression(LinearModel, RegressorMixin):
sklearn/linear_model/base.py- @property
sklearn/linear_model/base.py: @deprecated("``residues_`` is deprecated and will be removed in 0.19")
sklearn/linear_model/coordinate_descent.py=class ElasticNet(LinearModel, RegressorMixin):
sklearn/linear_model/coordinate_descent.py-
sklearn/linear_model/coordinate_descent.py: @deprecated(" and will be removed in 0.19")
sklearn/linear_model/logistic.py=def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
sklearn/linear_model/logistic.py- Whether or not to produce a copy of the data. A copy is not required
sklearn/linear_model/logistic.py: anymore. This parameter is deprecated and will be removed in 0.19.
sklearn/linear_model/logistic.py- warnings.warn("A copy is not required anymore. The 'copy' parameter "
sklearn/linear_model/logistic.py: "is deprecated and will be removed in 0.19.",
sklearn/linear_model/logistic.py-
sklearn/linear_model/logistic.py: # 'auto' is deprecated and will be removed in 0.19
sklearn/linear_model/logistic.py=class LogisticRegressionCV(LogisticRegression, BaseEstimator,
sklearn/linear_model/logistic.py- class_weight in ['balanced', 'auto']):
sklearn/linear_model/logistic.py: # 'auto' is deprecated and will be removed in 0.19
sklearn/linear_model/stochastic_gradient.py=class BaseSGDRegressor(BaseSGD, RegressorMixin):
sklearn/linear_model/stochastic_gradient.py-
sklearn/linear_model/stochastic_gradient.py: @deprecated(" and will be removed in 0.19.")
sklearn/metrics/base.py=from ..utils import deprecated
sklearn/metrics/base.py- at deprecated("UndefinedMetricWarning has been moved into the sklearn.exceptions"
sklearn/metrics/base.py: " module. It will not be available here from version 0.19")
sklearn/metrics/regression.py=def r2_score(y_true, y_pred,
sklearn/metrics/regression.py- deprecated since version 0.17 and will be changed to 'uniform_average'
sklearn/metrics/regression.py: starting from 0.19.
sklearn/metrics/regression.py- "0.17, it will be changed to 'uniform_average' "
sklearn/metrics/regression.py: "starting from 0.19.",
sklearn/multioutput.py=class MultiOutputRegressor(MultiOutputEstimator, RegressorMixin):
sklearn/multioutput.py- """
sklearn/multioutput.py: # XXX remove in 0.19 when r2_score default for multioutput changes
sklearn/pipeline.py=class Pipeline(_BasePipeline):
sklearn/pipeline.py- if hasattr(X, 'ndim') and X.ndim == 1:
sklearn/pipeline.py: warn("From version 0.19, a 1d X will not be reshaped in"
sklearn/preprocessing/data.py=DEPRECATION_MSG_1D = (
sklearn/preprocessing/data.py- "Passing 1d arrays as data is deprecated in 0.17 and will "
sklearn/preprocessing/data.py: "raise ValueError in 0.19. Reshape your data either using "
sklearn/preprocessing/data.py=class MinMaxScaler(BaseEstimator, TransformerMixin):
sklearn/preprocessing/data.py- @deprecated("Attribute data_range will be removed in "
sklearn/preprocessing/data.py: "0.19. Use ``data_range_`` instead")
sklearn/preprocessing/data.py- @deprecated("Attribute data_min will be removed in "
sklearn/preprocessing/data.py: "0.19. Use ``data_min_`` instead")
sklearn/preprocessing/data.py=class StandardScaler(BaseEstimator, TransformerMixin):
sklearn/preprocessing/data.py- @property
sklearn/preprocessing/data.py: @deprecated("Attribute ``std_`` will be removed in 0.19. "
sklearn/qda.py=warnings.warn("qda.QDA has been moved to "
sklearn/qda.py- "discriminant_analysis.QuadraticDiscriminantAnalysis "
sklearn/qda.py: "in 0.17 and will be removed in 0.19.", DeprecationWarning)
sklearn/qda.py=class QDA(_QDA):
sklearn/qda.py- .. deprecated:: 0.17
sklearn/qda.py: This class will be removed in 0.19.
sklearn/svm/base.py=class BaseLibSVM(six.with_metaclass(ABCMeta, BaseEstimator)):
sklearn/svm/base.py-
sklearn/svm/base.py: @deprecated(" and will be removed in 0.19")
sklearn/svm/base.py=class BaseSVC(six.with_metaclass(ABCMeta, BaseLibSVM, ClassifierMixin)):
sklearn/svm/base.py- warnings.warn("The decision_function_shape default value will "
sklearn/svm/base.py: "change from 'ovo' to 'ovr' in 0.19. This will change "
sklearn/svm/classes.py=class SVC(BaseSVC):
sklearn/svm/classes.py- compatibility and raise a deprecation warning, but will change 'ovr'
sklearn/svm/classes.py: in 0.19.
sklearn/svm/classes.py=class NuSVC(BaseSVC):
sklearn/svm/classes.py- compatibility and raise a deprecation warning, but will change 'ovr'
sklearn/svm/classes.py: in 0.19.
sklearn/utils/__init__.py=from ..exceptions import DataConversionWarning
sklearn/utils/__init__.py- at deprecated("ConvergenceWarning has been moved into the sklearn.exceptions "
sklearn/utils/__init__.py: "module. It will not be available here from version 0.19")
sklearn/utils/class_weight.py=def compute_class_weight(class_weight, classes, y):
sklearn/utils/class_weight.py- "class_weight='balanced'. 'auto' will be removed in"
sklearn/utils/class_weight.py: " 0.19", DeprecationWarning)
sklearn/utils/estimator_checks.py=MULTI_OUTPUT = ['CCA', 'DecisionTreeRegressor', 'ElasticNet',
sklearn/utils/estimator_checks.py-
sklearn/utils/estimator_checks.py:# Estimators with deprecated transform methods. Should be removed in 0.19 when
sklearn/utils/testing.py=def if_not_mac_os(versions=('10.7', '10.8', '10.9'),
sklearn/utils/testing.py- warnings.warn("if_not_mac_os is deprecated in 0.17 and will be removed"
sklearn/utils/testing.py: " in 0.19: use the safer and more generic"
sklearn/utils/validation.py=from ..exceptions import NotFittedError as _NotFittedError
sklearn/utils/validation.py- at deprecated("DataConversionWarning has been moved into the sklearn.exceptions"
sklearn/utils/validation.py: " module. It will not be available here from version 0.19")
sklearn/utils/validation.py=class DataConversionWarning(_DataConversionWarning):
sklearn/utils/validation.py- at deprecated("NonBLASDotWarning has been moved into the sklearn.exceptions"
sklearn/utils/validation.py: " module. It will not be available here from version 0.19")
sklearn/utils/validation.py=class NonBLASDotWarning(_NonBLASDotWarning):
sklearn/utils/validation.py- at deprecated("NotFittedError has been moved into the sklearn.exceptions module."
sklearn/utils/validation.py: " It will not be available here from version 0.19")
sklearn/utils/validation.py=def check_array(array, accept_sparse=None, dtype="numeric", order=None,
sklearn/utils/validation.py- "Passing 1d arrays as data is deprecated in 0.17 and will "
sklearn/utils/validation.py: "raise ValueError in 0.19. Reshape your data either using "
sklearn/utils/validation.py=def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):
sklearn/utils/validation.py- if not all_or_any([hasattr(estimator, attr) for attr in attributes]):
sklearn/utils/validation.py: # FIXME NotFittedError_ --> NotFittedError in 0.19
-------------- next part --------------
sklearn/base.py=def clone(estimator, safe=True):
sklearn/base.py- " This behavior is deprecated as of 0.18 and "
sklearn/base.py: "support for this behavior will be removed in 0.20."
sklearn/cross_validation.py=warnings.warn("This module was deprecated in version 0.18 in favor of the "
sklearn/cross_validation.py- "new CV iterators are different from that of this module. "
sklearn/cross_validation.py: "This module will be removed in 0.20.", DeprecationWarning)
sklearn/cross_validation.py=class LeaveOneOut(_PartitionIterator):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class LeavePOut(_PartitionIterator):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class KFold(_BaseKFold):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class LabelKFold(_BaseKFold):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class StratifiedKFold(_BaseKFold):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class LeaveOneLabelOut(_PartitionIterator):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class LeavePLabelOut(_PartitionIterator):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class ShuffleSplit(BaseShuffleSplit):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class StratifiedShuffleSplit(BaseShuffleSplit):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class PredefinedSplit(_PartitionIterator):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=class LabelShuffleSplit(ShuffleSplit):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=def cross_val_predict(estimator, X, y=None, cv=None, n_jobs=1,
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=def check_cv(cv, X=None, y=None, classifier=False):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=def permutation_test_score(estimator, X, y, cv=None,
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/cross_validation.py=def train_test_split(*arrays, **options):
sklearn/cross_validation.py- .. deprecated:: 0.18
sklearn/cross_validation.py: This module will be removed in 0.20.
sklearn/decomposition/online_lda.py=class LatentDirichletAllocation(BaseEstimator, TransformerMixin):
sklearn/decomposition/online_lda.py- faster than the batch update.
sklearn/decomposition/online_lda.py: The default learning method is going to be changed to 'batch' in the 0.20 release.
sklearn/decomposition/online_lda.py- warnings.warn("The default value for 'learning_method' will be "
sklearn/decomposition/online_lda.py: "changed from 'online' to 'batch' in the release 0.20. "
sklearn/decomposition/pca.py=class PCA(_BasePCA):
sklearn/decomposition/pca.py-
sklearn/decomposition/pca.py:@deprecated("RandomizedPCA was deprecated in 0.18 and will be removed in 0.20. "
sklearn/decomposition/pca.py=class RandomizedPCA(BaseEstimator, TransformerMixin):
sklearn/decomposition/pca.py- .. deprecated:: 0.18
sklearn/decomposition/pca.py: This class will be removed in 0.20.
sklearn/gaussian_process/gaussian_process.py=MACHINE_EPSILON = np.finfo(np.double).eps
sklearn/gaussian_process/gaussian_process.py- at deprecated("l1_cross_distances was deprecated in version 0.18 "
sklearn/gaussian_process/gaussian_process.py: "and will be removed in 0.20.")
sklearn/gaussian_process/gaussian_process.py=def l1_cross_distances(X):
sklearn/gaussian_process/gaussian_process.py- at deprecated("GaussianProcess was deprecated in version 0.18 and will be "
sklearn/gaussian_process/gaussian_process.py: "removed in 0.20. Use the GaussianProcessRegressor instead.")
sklearn/gaussian_process/gaussian_process.py=class GaussianProcess(BaseEstimator, RegressorMixin):
sklearn/gaussian_process/gaussian_process.py- .. deprecated:: 0.18
sklearn/gaussian_process/gaussian_process.py: This class will be removed in 0.20.
sklearn/grid_search.py=warnings.warn("This module was deprecated in version 0.18 in favor of the "
sklearn/grid_search.py- "model_selection module into which all the refactored classes "
sklearn/grid_search.py: "and functions are moved. This module will be removed in 0.20.",
sklearn/grid_search.py=class ParameterGrid(object):
sklearn/grid_search.py- .. deprecated:: 0.18
sklearn/grid_search.py: This module will be removed in 0.20.
sklearn/grid_search.py=class ParameterSampler(object):
sklearn/grid_search.py- .. deprecated:: 0.18
sklearn/grid_search.py: This module will be removed in 0.20.
sklearn/grid_search.py=def fit_grid_point(X, y, estimator, parameters, train, test, scorer,
sklearn/grid_search.py- .. deprecated:: 0.18
sklearn/grid_search.py: This module will be removed in 0.20.
sklearn/grid_search.py=class GridSearchCV(BaseSearchCV):
sklearn/grid_search.py- .. deprecated:: 0.18
sklearn/grid_search.py: This module will be removed in 0.20.
sklearn/grid_search.py=class RandomizedSearchCV(BaseSearchCV):
sklearn/grid_search.py- .. deprecated:: 0.18
sklearn/grid_search.py: This module will be removed in 0.20.
sklearn/isotonic.py=class IsotonicRegression(BaseEstimator, TransformerMixin, RegressorMixin):
sklearn/isotonic.py- @deprecated("Attribute ``X_`` is deprecated in version 0.18 and will be"
sklearn/isotonic.py: " removed in version 0.20.")
sklearn/isotonic.py- @deprecated("Attribute ``y_`` is deprecated in version 0.18 and will"
sklearn/isotonic.py: " be removed in version 0.20.")
sklearn/learning_curve.py=warnings.warn("This module was deprecated in version 0.18 in favor of the "
sklearn/learning_curve.py- "model_selection module into which all the functions are moved."
sklearn/learning_curve.py: " This module will be removed in 0.20",
sklearn/learning_curve.py=def learning_curve(estimator, X, y, train_sizes=np.linspace(0.1, 1.0, 5),
sklearn/learning_curve.py- .. deprecated:: 0.18
sklearn/learning_curve.py: This module will be removed in 0.20.
sklearn/learning_curve.py=def validation_curve(estimator, X, y, param_name, param_range, cv=None,
sklearn/learning_curve.py- .. deprecated:: 0.18
sklearn/learning_curve.py: This module will be removed in 0.20.
sklearn/linear_model/base.py=def make_dataset(X, y, sample_weight, random_state=None):
sklearn/linear_model/base.py- at deprecated("sparse_center_data was deprecated in version 0.18 and will be "
sklearn/linear_model/base.py: "removed in 0.20. Use utilities in preprocessing.data instead")
sklearn/linear_model/base.py=def sparse_center_data(X, y, fit_intercept, normalize=False):
sklearn/linear_model/base.py- at deprecated("center_data was deprecated in version 0.18 and will be removed in "
sklearn/linear_model/base.py: "0.20. Use utilities in preprocessing.data instead")
sklearn/linear_model/ransac.py=class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin):
sklearn/linear_model/ransac.py-
sklearn/linear_model/ransac.py: NOTE: residual_metric is deprecated from 0.18 and will be removed in 0.20
sklearn/linear_model/ransac.py- "'residual_metric' was deprecated in version 0.18 and "
sklearn/linear_model/ransac.py: "will be removed in version 0.20. Use 'loss' instead.",
sklearn/linear_model/ransac.py-
sklearn/linear_model/ransac.py: # XXX: Deprecation: Remove this if block in 0.20
sklearn/metrics/classification.py=def hamming_loss(y_true, y_pred, labels=None, sample_weight=None,
sklearn/metrics/classification.py- (deprecated) Integer array of labels. This parameter has been
sklearn/metrics/classification.py: renamed to ``labels`` in version 0.18 and will be removed in 0.20.
sklearn/metrics/classification.py- warnings.warn("'classes' was renamed to 'labels' in version 0.18 and "
sklearn/metrics/classification.py: "will be removed in 0.20.", DeprecationWarning)
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method mean_squared_error was renamed to '
sklearn/metrics/scorer.py- 'neg_mean_squared_error in version 0.18 and will '
sklearn/metrics/scorer.py: 'be removed in 0.20.')
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method mean_absolute_error was renamed to '
sklearn/metrics/scorer.py- 'neg_mean_absolute_error in version 0.18 and will '
sklearn/metrics/scorer.py: 'be removed in 0.20.')
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method median_absolute_error was renamed to '
sklearn/metrics/scorer.py- 'neg_median_absolute_error in version 0.18 and will '
sklearn/metrics/scorer.py: 'be removed in 0.20.')
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method log_loss was renamed to '
sklearn/metrics/scorer.py: 'neg_log_loss in version 0.18 and will be removed in 0.20.')
sklearn/mixture/dpgmm.py=from __future__ import print_function
sklearn/mixture/dpgmm.py-
sklearn/mixture/dpgmm.py:# Important note for the deprecation cleaning of 0.20 :
sklearn/mixture/dpgmm.py=from .gmm import _GMMBase
sklearn/mixture/dpgmm.py- at deprecated("The function digamma is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py: "will be removed in 0.20. Use scipy.special.digamma instead.")
sklearn/mixture/dpgmm.py=def digamma(x):
sklearn/mixture/dpgmm.py- at deprecated("The function gammaln is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py: "will be removed in 0.20. Use scipy.special.gammaln instead.")
sklearn/mixture/dpgmm.py=def gammaln(x):
sklearn/mixture/dpgmm.py- at deprecated("The function log_normalize is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py: "will be removed in 0.20.")
sklearn/mixture/dpgmm.py=def log_normalize(v, axis=0):
sklearn/mixture/dpgmm.py- at deprecated("The function wishart_log_det is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py: "will be removed in 0.20.")
sklearn/mixture/dpgmm.py=def wishart_log_det(a, b, detB, n_features):
sklearn/mixture/dpgmm.py- at deprecated("The function wishart_logz is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py: "will be removed in 0.20.")
sklearn/mixture/dpgmm.py=class _DPGMMBase(_GMMBase):
sklearn/mixture/dpgmm.py- "instead. DPGMM is deprecated in 0.18 and will be "
sklearn/mixture/dpgmm.py: "removed in 0.20.")
sklearn/mixture/dpgmm.py=class DPGMM(_DPGMMBase):
sklearn/mixture/dpgmm.py- .. deprecated:: 0.18
sklearn/mixture/dpgmm.py: This class will be removed in 0.20.
sklearn/mixture/dpgmm.py- "'dirichlet_distribution'` instead. "
sklearn/mixture/dpgmm.py: "VBGMM is deprecated in 0.18 and will be removed in 0.20.")
sklearn/mixture/dpgmm.py=class VBGMM(_DPGMMBase):
sklearn/mixture/dpgmm.py- .. deprecated:: 0.18
sklearn/mixture/dpgmm.py: This class will be removed in 0.20.
sklearn/mixture/gmm.py=of Gaussian Mixture Models.
sklearn/mixture/gmm.py-
sklearn/mixture/gmm.py:# Important note for the deprecation cleaning of 0.20 :
sklearn/mixture/gmm.py=EPS = np.finfo(float).eps
sklearn/mixture/gmm.py- at deprecated("The function log_multivariate_normal_density is deprecated in 0.18"
sklearn/mixture/gmm.py: " and will be removed in 0.20.")
sklearn/mixture/gmm.py=def log_multivariate_normal_density(X, means, covars, covariance_type='diag'):
sklearn/mixture/gmm.py- at deprecated("The function sample_gaussian is deprecated in 0.18"
sklearn/mixture/gmm.py: " and will be removed in 0.20."
sklearn/mixture/gmm.py=class _GMMBase(BaseEstimator):
sklearn/mixture/gmm.py- at deprecated("The class GMM is deprecated in 0.18 and will be "
sklearn/mixture/gmm.py: " removed in 0.20. Use class GaussianMixture instead.")
sklearn/mixture/gmm.py=class GMM(_GMMBase):
sklearn/mixture/gmm.py- .. deprecated:: 0.18
sklearn/mixture/gmm.py: This class will be removed in 0.20.
sklearn/mixture/gmm.py=def _validate_covars(covars, covariance_type, n_components):
sklearn/mixture/gmm.py- at deprecated("The functon distribute_covar_matrix_to_match_covariance_type"
sklearn/mixture/gmm.py: "is deprecated in 0.18 and will be removed in 0.20.")
sklearn/model_selection/_search.py=def _check_param_grid(param_grid):
sklearn/model_selection/_search.py-
sklearn/model_selection/_search.py:# XXX Remove in 0.20
sklearn/model_selection/_search.py=class BaseSearchCV(six.with_metaclass(ABCMeta, BaseEstimator,
sklearn/model_selection/_search.py- " in favor of the more elaborate cv_results_ attribute."
sklearn/model_selection/_search.py: " The grid_scores_ attribute will not be available from 0.20",
sklearn/tree/_utils.pyx=cdef realloc_ptr safe_realloc(realloc_ptr* p, size_t nelems) except *:
sklearn/tree/_utils.pyx- # sizeof(realloc_ptr[0]) would be more like idiomatic C, but causes Cython
sklearn/tree/_utils.pyx: # 0.20.1 to crash.
sklearn/tree/export.py=def export_graphviz(decision_tree, out_file=SENTINEL, max_depth=None,
sklearn/tree/export.py- Handle or name of the output file. If ``None``, the result is
sklearn/tree/export.py: returned as a string. This will the default from version 0.20.
sklearn/tree/export.py- warnings.warn("out_file can be set to None starting from 0.18. "
sklearn/tree/export.py: "This will be the default in 0.20.",
sklearn/utils/fast_dict.pyx=cdef class IntFloatDict:
sklearn/utils/fast_dict.pyx-
sklearn/utils/fast_dict.pyx: # Cython 0.20 generates buggy code below. Commenting this out for now
-------------- next part --------------
sklearn/covariance/graph_lasso_.py=class GraphLassoCV(GraphLasso):
sklearn/covariance/graph_lasso_.py- @deprecated("Attribute grid_scores was deprecated in version 0.19 and "
sklearn/covariance/graph_lasso_.py: "will be removed in 0.21. Use 'grid_scores_' instead")
sklearn/datasets/data/boston_house_prices.csv-0.14455,12.5,7.87,0,0.524,6.172,96.1,5.9505,5,311,15.2,396.9,19.15,27.1
sklearn/datasets/data/boston_house_prices.csv-0.04684,0,3.41,0,0.489,6.417,66.1,3.0923,2,270,17.8,392.18,8.81,22.6
sklearn/datasets/data/boston_house_prices.csv-0.38735,0,25.65,0,0.581,5.613,95.6,1.7572,2,188,19.1,359.29,27.26,15.7
sklearn/datasets/data/breast_cancer.csv-15.12,16.68,98.78,716.6,0.08876,0.09588,0.0755,0.04079,0.1594,0.05986,0.2711,0.3621,1.974,26.44,0.005472,0.01919,0.02039,0.00826,0.01523,0.002881,17.77,20.24,117.7,989.5,0.1491,0.3331,0.3327,0.1252,0.3415,0.0974,0
sklearn/datasets/data/breast_cancer.csv-17.93,24.48,115.2,998.9,0.08855,0.07027,0.05699,0.04744,0.1538,0.0551,0.4212,1.433,2.765,45.81,0.005444,0.01169,0.01622,0.008522,0.01419,0.002751,20.92,34.69,135.1,1320,0.1315,0.1806,0.208,0.1136,0.2504,0.07948,0
sklearn/datasets/data/breast_cancer.csv-9,14.4,56.36,246.3,0.07005,0.03116,0.003681,0.003472,0.1788,0.06833,0.1746,1.305,1.144,9.789,0.007389,0.004883,0.003681,0.003472,0.02701,0.002153,9.699,20.07,60.9,285.5,0.09861,0.05232,0.01472,0.01389,0.2991,0.07804,1
sklearn/datasets/data/breast_cancer.csv-12.2,15.21,78.01,457.9,0.08673,0.06545,0.01994,0.01692,0.1638,0.06129,0.2575,0.8073,1.959,19.01,0.005403,0.01418,0.01051,0.005142,0.01333,0.002065,13.75,21.38,91.11,583.1,0.1256,0.1928,0.1167,0.05556,0.2661,0.07961,1
sklearn/decomposition/online_lda.py=class LatentDirichletAllocation(BaseEstimator, TransformerMixin):
sklearn/decomposition/online_lda.py- "be ignored as of 0.19. Support for this argument "
sklearn/decomposition/online_lda.py: "will be removed in 0.21.", DeprecationWarning)
sklearn/decomposition/sparse_pca.py=class SparsePCA(BaseEstimator, TransformerMixin):
sklearn/decomposition/sparse_pca.py- .. deprecated:: 0.19
sklearn/decomposition/sparse_pca.py: This parameter will be removed in 0.21.
sklearn/decomposition/sparse_pca.py- warnings.warn("The ridge_alpha parameter on transform() is "
sklearn/decomposition/sparse_pca.py: "deprecated since 0.19 and will be removed in 0.21. "
sklearn/ensemble/gradient_boosting.py=class BaseGradientBoosting(six.with_metaclass(ABCMeta, BaseEnsemble)):
sklearn/ensemble/gradient_boosting.py- @deprecated("Attribute n_features was deprecated in version 0.19 and "
sklearn/ensemble/gradient_boosting.py: "will be removed in 0.21.")
sklearn/gaussian_process/gpr.py=class GaussianProcessRegressor(BaseEstimator, RegressorMixin):
sklearn/gaussian_process/gpr.py- @deprecated("Attribute rng was deprecated in version 0.19 and "
sklearn/gaussian_process/gpr.py: "will be removed in 0.21.")
sklearn/gaussian_process/gpr.py- @deprecated("Attribute y_train_mean was deprecated in version 0.19 and "
sklearn/gaussian_process/gpr.py: "will be removed in 0.21.")
sklearn/linear_model/stochastic_gradient.py=class BaseSGDClassifier(six.with_metaclass(ABCMeta, BaseSGD,
sklearn/linear_model/stochastic_gradient.py- @deprecated("Attribute loss_function was deprecated in version 0.19 and "
sklearn/linear_model/stochastic_gradient.py: "will be removed in 0.21. Use 'loss_function_' instead")
sklearn/manifold/t_sne.py=class TSNE(BaseEstimator):
sklearn/manifold/t_sne.py- @deprecated("Attribute n_iter_final was deprecated in version 0.19 and "
sklearn/manifold/t_sne.py: "will be removed in 0.21. Use 'n_iter_' instead")
sklearn/utils/validation.py=def check_array(array, accept_sparse=False, dtype="numeric", order=None,
sklearn/utils/validation.py- "check_array and check_X_y is deprecated in version 0.19 "
sklearn/utils/validation.py: "and will be removed in 0.21. Use 'accept_sparse=False' "
From joel.nothman at gmail.com Wed Feb 8 22:39:20 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Thu, 9 Feb 2017 14:39:20 +1100
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To:
References:
<20170109151546.GM2802991@phare.normalesup.org>
<20170111215115.GO1585067@phare.normalesup.org>
Message-ID:
See also
http://scikit-learn.org/stable/modules/classes.html#recently-deprecated
On 9 February 2017 at 14:30, Joel Nothman wrote:
> Not sure that this quite gives you a number, but:
>
>
> $git checkout 0.18.1
> $ git grep -pwB1 0.19 sklearn | grep -ve ^- -e .csv: -e /tests/ >
> /tmp/dep19.txt
>
> etc.
>
> edited results attached.
>
>
> On 9 February 2017 at 04:15, Andrew Howe wrote:
>
>> How many current deprecations are expected in the next release?
>>
>> Andrew
>>
>> On Jan 12, 2017 00:53, "Gael Varoquaux"
>> wrote:
>>
>> On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
>> > When the two versions deprecation policy was instituted, releases were
>> much
>> > more frequent... Is that enough of an excuse?
>>
>> I'd rather say that we can here decide that we are giving a longer grace
>> period.
>>
>> I think that slow deprecations are a good things (see titus's blog post
>> here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )
>>
>> G
>>
>> > On 12 January 2017 at 03:43, Andreas Mueller wrote:
>>
>>
>>
>> > On 01/09/2017 10:15 AM, Gael Varoquaux wrote:
>>
>> > instead of setting up a roadmap I would rather just
>> identify bugs
>> > that
>> > are blockers and fix only those and don't wait for any
>> feature
>> > before
>> > cutting 0.19.X.
>>
>>
>>
>> > I agree with the sentiment, but this would mess with our
>> deprecation cycle.
>> > If we release now, and then release again soonish, that means
>> people have
>> > less calendar time
>> > to react to deprecations.
>>
>> > We could either accept this or change all deprecations and bump the
>> removal
>> > by a version?
>>
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> --
>> Gael Varoquaux
>> Researcher, INRIA Parietal
>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>> Phone: ++ 33-1-69-08-79-68
>> http://gael-varoquaux.info http://twitter.com/GaelVaroqua
>> ux
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mmahesh.chandra873 at gmail.com Sat Feb 11 09:18:50 2017
From: mmahesh.chandra873 at gmail.com (Mahesh Chandra)
Date: Sat, 11 Feb 2017 15:18:50 +0100
Subject: [scikit-learn] Logistic regression doesnt converge?
Message-ID:
>reg = 0.1
lr = LogisticRegression(C=1/reg,max_iter=100,
fit_intercept=True,solver='lbfgs').fit(X_train, y_train)
ytrain_hat = lr.predict_proba(X_train)
loss = log_loss(y_train,ytrain_hat)
print loss
print loss + 0.5*reg*LA.norm(lr.coef_)
Maybe i am doing it wrong
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mmahesh.chandra873 at gmail.com Sat Feb 11 09:24:09 2017
From: mmahesh.chandra873 at gmail.com (Mahesh Chandra)
Date: Sat, 11 Feb 2017 15:24:09 +0100
Subject: [scikit-learn] Logistic regression doesnt converge?
In-Reply-To:
References:
Message-ID:
Sorry for incomplete email.
Hi,
My question was that even after using many solvers, i dont get convergence
for Logistic regression. The loss value as calculated in the previous
email was less for maxiter=10 than when maxiter = 30. So, does the
optimization method diverge and also how do we monitor and store the loss
(or any metric) after each iteration?
Thanks
Mahesh
On Sat, Feb 11, 2017 at 3:18 PM, Mahesh Chandra <
mmahesh.chandra873 at gmail.com> wrote:
> >reg = 0.1
> lr = LogisticRegression(C=1/reg,max_iter=100, fit_intercept=True,solver='lbfgs').fit(X_train,
> y_train)
> ytrain_hat = lr.predict_proba(X_train)
> loss = log_loss(y_train,ytrain_hat)
> print loss
> print loss + 0.5*reg*LA.norm(lr.coef_)
>
> Maybe i am doing it wrong
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From benjamin.merkt at bcf.uni-freiburg.de Mon Feb 13 04:55:55 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Mon, 13 Feb 2017 10:55:55 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence in
the dictionary
Message-ID:
Hi everyone,
I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal
using a dictionary learned by a KSVD algorithm (pyksvd). However, during
the fit I get the following RuntimeWarning:
/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
RuntimeWarning: Orthogonal matching pursuit ended prematurely due to
linear dependence in the dictionary. The requested precision might not
have been met.
copy_X=copy_X, return_path=return_path)
In those cases the results are indeed not satisfactory. I don't get the
point of this warning as it is common in sparse coding to have an
overcomplete dictionary an thus also linear dependency within it. That
should not be an issue for OMP. In fact, the warning is also raised if
the dictionary is a square matrix.
Might this Warning also point to other issues in the application?
Thanks, Ben
From zephyr14 at gmail.com Mon Feb 13 17:31:35 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Tue, 14 Feb 2017 07:31:35 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To:
References:
Message-ID:
Hi,
Are the columns of your matrix normalized? Try setting `normalized=True`.
Yours,
Vlad
On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
wrote:
> Hi everyone,
>
> I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal using
> a dictionary learned by a KSVD algorithm (pyksvd). However, during the fit I
> get the following RuntimeWarning:
>
> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to linear
> dependence in the dictionary. The requested precision might not have been
> met.
>
> copy_X=copy_X, return_path=return_path)
>
> In those cases the results are indeed not satisfactory. I don't get the
> point of this warning as it is common in sparse coding to have an
> overcomplete dictionary an thus also linear dependency within it. That
> should not be an issue for OMP. In fact, the warning is also raised if the
> dictionary is a square matrix.
>
> Might this Warning also point to other issues in the application?
>
>
> Thanks, Ben
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
From benjamin.merkt at bcf.uni-freiburg.de Tue Feb 14 05:00:52 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Tue, 14 Feb 2017 11:00:52 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To:
References:
Message-ID: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
Hi,
I tried that with no effect. The fit still breaks after two iterations.
If I set precompute=True I get three coefficients instead of only two.
My Dictionary is fairly large (currently 128x42000). Is it even feasible
to use OMP with such a big Matrix (even with ~120GB ram)?
-Ben
On 13.02.2017 23:31, Vlad Niculae wrote:
> Hi,
>
> Are the columns of your matrix normalized? Try setting `normalized=True`.
>
> Yours,
> Vlad
>
> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
> wrote:
>> Hi everyone,
>>
>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal using
>> a dictionary learned by a KSVD algorithm (pyksvd). However, during the fit I
>> get the following RuntimeWarning:
>>
>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to linear
>> dependence in the dictionary. The requested precision might not have been
>> met.
>>
>> copy_X=copy_X, return_path=return_path)
>>
>> In those cases the results are indeed not satisfactory. I don't get the
>> point of this warning as it is common in sparse coding to have an
>> overcomplete dictionary an thus also linear dependency within it. That
>> should not be an issue for OMP. In fact, the warning is also raised if the
>> dictionary is a square matrix.
>>
>> Might this Warning also point to other issues in the application?
>>
>>
>> Thanks, Ben
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
From pa at letnes.com Tue Feb 14 05:54:27 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 11:54:27 +0100
Subject: [scikit-learn] cross validation scores seem off for PLSRegression
Message-ID: <1487069667072.47907.95300@webmail1>
Hi!
Versions:
sklearn 0.18.1
numpy 1.11.3
Anaconda python 3.5 on ubuntu 16.04
What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below.
Cheers
Paul
In [2]: import numpy as np
In [3]: y = np.random.random((10, 3))
In [4]: x = np.random.random((10, 17))
In [5]: from sklearn.cross_decomposition import PLSRegression
In [6]: pls = PLSRegression(n_components=3)
In [7]: from sklearn.cross_validation import cross_val_score
In [8]: from sklearn.model_selection import cross_val_score
In [9]: cross_val_score(pls, x, y)
Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
PS:
This happens even if I cheat by setting y to the predicted value, and cross validate on that.
In [29]: y = x @ pls.coef_
In [30]: cross_val_score(pls, x, y)
/home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5
warnings.warn('Y residual constant at iteration %s' % k)
/home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
warnings.warn('Y residual constant at iteration %s' % k)
/home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
warnings.warn('Y residual constant at iteration %s' % k)
Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
In [34]: np.max(np.abs(y - x @ pls.coef_))
Out[34]: 0.0
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From abdalrahman.eweiwi at gmail.com Tue Feb 14 06:05:52 2017
From: abdalrahman.eweiwi at gmail.com (abdalrahman eweiwi)
Date: Tue, 14 Feb 2017 12:05:52 +0100
Subject: [scikit-learn] cross validation scores seem off for
PLSRegression
In-Reply-To: <1487069667072.47907.95300@webmail1>
References: <1487069667072.47907.95300@webmail1>
Message-ID:
Hi Paul,
PLSRegression in sklearn uses an iterative method to estimate the eigen
vectors and values (I think it is the power method) , which mostly varies
depending on the underlying library that you use,
I would suggest to use SVD instead if you want to get stable results and
your dataset is small
I have wrote also wrote a Kernal PLS which you can find here
https://gist.github.com/aeweiwi/7788156
Cheers,
On Tue, Feb 14, 2017 at 11:54 AM, Paul Anton Letnes wrote:
> Hi!
>
> Versions:
> sklearn 0.18.1
> numpy 1.11.3
> Anaconda python 3.5 on ubuntu 16.04
>
> What range is the cross_val_score supposed to be in? I was under the
> impression from the documentation, although I cannot find it stated
> explicitly anywhere, that it should be a number in the range [0, 1].
> However, it appears that one can get large negative values; see the ipython
> session below.
>
> Cheers
> Paul
>
> In [2]: import numpy as np
>
> In [3]: y = np.random.random((10, 3))
>
> In [4]: x = np.random.random((10, 17))
>
> In [5]: from sklearn.cross_decomposition import PLSRegression
>
> In [6]: pls = PLSRegression(n_components=3)
>
> In [7]: from sklearn.cross_validation import cross_val_score
>
> In [8]: from sklearn.model_selection import cross_val_score
>
> In [9]: cross_val_score(pls, x, y)
> Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
>
>
> PS:
> This happens even if I cheat by setting y to the predicted value, and
> cross validate on that.
>
> In [29]: y = x @ pls.coef_
>
> In [30]: cross_val_score(pls, x, y)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-
> packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual
> constant at iteration 5
> warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-
> packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual
> constant at iteration 6
> warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-
> packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual
> constant at iteration 6
> warnings.warn('Y residual constant at iteration %s' % k)
> Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
>
> In [34]: np.max(np.abs(y - x @ pls.coef_))
> Out[34]: 0.0
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From fabian.boehnlein at gmail.com Tue Feb 14 06:08:11 2017
From: fabian.boehnlein at gmail.com (=?UTF-8?Q?Fabian_B=C3=B6hnlein?=)
Date: Tue, 14 Feb 2017 11:08:11 +0000
Subject: [scikit-learn] cross validation scores seem off for
PLSRegression
In-Reply-To: <1487069667072.47907.95300@webmail1>
References: <1487069667072.47907.95300@webmail1>
Message-ID:
Hi Paul,
not sure what @ syntax does in ipython, but seems you're setting y to the
coefficients of the model instead of y_hat = pls.predict(x).
Also see in the documentation why R^2 can be negative:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cross_decomposition.PLSRegression.score
Best,
Fabian
On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes wrote:
> Hi!
>
> Versions:
> sklearn 0.18.1
> numpy 1.11.3
> Anaconda python 3.5 on ubuntu 16.04
>
> What range is the cross_val_score supposed to be in? I was under the
> impression from the documentation, although I cannot find it stated
> explicitly anywhere, that it should be a number in the range [0, 1].
> However, it appears that one can get large negative values; see the ipython
> session below.
>
> Cheers
> Paul
>
> In [2]: import numpy as np
>
> In [3]: y = np.random.random((10, 3))
>
> In [4]: x = np.random.random((10, 17))
>
> In [5]: from sklearn.cross_decomposition import PLSRegression
>
> In [6]: pls = PLSRegression(n_components=3)
>
> In [7]: from sklearn.cross_validation import cross_val_score
>
> In [8]: from sklearn.model_selection import cross_val_score
>
> In [9]: cross_val_score(pls, x, y)
> Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
>
>
> PS:
> This happens even if I cheat by setting y to the predicted value, and
> cross validate on that.
>
> In [29]: y = x @ pls.coef_
>
> In [30]: cross_val_score(pls, x, y)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293:
> UserWarning: Y residual constant at iteration 5
> warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293:
> UserWarning: Y residual constant at iteration 6
> warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293:
> UserWarning: Y residual constant at iteration 6
> warnings.warn('Y residual constant at iteration %s' % k)
> Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
>
> In [34]: np.max(np.abs(y - x @ pls.coef_))
> Out[34]: 0.0
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From benjamin.merkt at bcf.uni-freiburg.de Tue Feb 14 06:19:42 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Tue, 14 Feb 2017 12:19:42 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
References:
<80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
Message-ID: <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
OK, the issue is resolved. My dictionary was still in 32bit float from
saving. When I convert it to 64float before calling fit it works fine.
Sorry to bother.
On 14.02.2017 11:00, Benjamin Merkt wrote:
> Hi,
>
> I tried that with no effect. The fit still breaks after two iterations.
>
> If I set precompute=True I get three coefficients instead of only two.
> My Dictionary is fairly large (currently 128x42000). Is it even feasible
> to use OMP with such a big Matrix (even with ~120GB ram)?
>
> -Ben
>
>
>
> On 13.02.2017 23:31, Vlad Niculae wrote:
>> Hi,
>>
>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>
>> Yours,
>> Vlad
>>
>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>> wrote:
>>> Hi everyone,
>>>
>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>> signal using
>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>> the fit I
>>> get the following RuntimeWarning:
>>>
>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to
>>> linear
>>> dependence in the dictionary. The requested precision might not have
>>> been
>>> met.
>>>
>>> copy_X=copy_X, return_path=return_path)
>>>
>>> In those cases the results are indeed not satisfactory. I don't get the
>>> point of this warning as it is common in sparse coding to have an
>>> overcomplete dictionary an thus also linear dependency within it. That
>>> should not be an issue for OMP. In fact, the warning is also raised
>>> if the
>>> dictionary is a square matrix.
>>>
>>> Might this Warning also point to other issues in the application?
>>>
>>>
>>> Thanks, Ben
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
From zephyr14 at gmail.com Tue Feb 14 06:26:07 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Tue, 14 Feb 2017 20:26:07 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To: <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
References:
<80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
<7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
Message-ID:
Hi Ben,
This actually sounds like a bug in this case! At a glance, the code
should use the correct BLAS calls for the data type you provide. Can
you reproduce this with a simple small example that gets different
results if the data is 32 vs 64 bit? Would you mind filing an issue?
Thanks,
Vlad
On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
wrote:
> OK, the issue is resolved. My dictionary was still in 32bit float from
> saving. When I convert it to 64float before calling fit it works fine.
>
> Sorry to bother.
>
>
>
> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>
>> Hi,
>>
>> I tried that with no effect. The fit still breaks after two iterations.
>>
>> If I set precompute=True I get three coefficients instead of only two.
>> My Dictionary is fairly large (currently 128x42000). Is it even feasible
>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>
>> -Ben
>>
>>
>>
>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>
>>> Hi,
>>>
>>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>>
>>> Yours,
>>> Vlad
>>>
>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>> signal using
>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>> the fit I
>>>> get the following RuntimeWarning:
>>>>
>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to
>>>> linear
>>>> dependence in the dictionary. The requested precision might not have
>>>> been
>>>> met.
>>>>
>>>> copy_X=copy_X, return_path=return_path)
>>>>
>>>> In those cases the results are indeed not satisfactory. I don't get the
>>>> point of this warning as it is common in sparse coding to have an
>>>> overcomplete dictionary an thus also linear dependency within it. That
>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>> if the
>>>> dictionary is a square matrix.
>>>>
>>>> Might this Warning also point to other issues in the application?
>>>>
>>>>
>>>> Thanks, Ben
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
From pa at letnes.com Tue Feb 14 06:27:11 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 12:27:11 +0100
Subject: [scikit-learn] cross validation scores seem off for
PLSRegression
In-Reply-To:
References: <1487069667072.47907.95300@webmail1>
Message-ID: <1487071631037.11717.96286@webmail8>
@ is a python operator meaning "matrix multiplication".
I was deliberately setting y to the prediction to make sure that the PLS model should be able to recreate the values completely and give a sensible score.
Paul
On 14 February 2017 at 12:08:11 +01:00, Fabian B?hnlein wrote:
> Hi Paul,
>
> not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x).
>
> Also see in the documentation why R^2 can be negative:
>
> Best,
> Fabian
>
>
> On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <> wrote:
>
> > Hi!
> >
> > Versions:
> > sklearn 0.18.1
> > numpy 1.11.3
> > Anaconda python 3.5 on ubuntu 16.04
> >
> > What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below.
> >
> > Cheers
> > Paul
> >
> > In [2]: import numpy as np
> >
> > In [3]: y = np.random.random((10, 3))
> >
> > In [4]: x = np.random.random((10, 17))
> >
> > In [5]: from sklearn.cross_decomposition import PLSRegression
> >
> > In [6]: pls = PLSRegression(n_components=3)
> >
> > In [7]: from sklearn.cross_validation import cross_val_score
> >
> > In [8]: from sklearn.model_selection import cross_val_score
> >
> > In [9]: cross_val_score(pls, x, y)
> > Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
> >
> >
> > PS:
> > This happens even if I cheat by setting y to the predicted value, and cross validate on that.
> >
> > In [29]: y = x @ pls.coef_
> >
> > In [30]: cross_val_score(pls, x, y)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
> >
> > In [34]: np.max(np.abs(y - x @ pls.coef_))
> > Out[34]: 0.0
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> >
> >
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From zephyr14 at gmail.com Tue Feb 14 06:28:08 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Tue, 14 Feb 2017 20:28:08 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To:
References:
<80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
<7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
Message-ID:
One possible issue I can see causing this is if X and y have different
dtypes... was this the case for you?
On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote:
> Hi Ben,
>
> This actually sounds like a bug in this case! At a glance, the code
> should use the correct BLAS calls for the data type you provide. Can
> you reproduce this with a simple small example that gets different
> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>
> Thanks,
> Vlad
>
>
> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
> wrote:
>> OK, the issue is resolved. My dictionary was still in 32bit float from
>> saving. When I convert it to 64float before calling fit it works fine.
>>
>> Sorry to bother.
>>
>>
>>
>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>
>>> Hi,
>>>
>>> I tried that with no effect. The fit still breaks after two iterations.
>>>
>>> If I set precompute=True I get three coefficients instead of only two.
>>> My Dictionary is fairly large (currently 128x42000). Is it even feasible
>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>
>>> -Ben
>>>
>>>
>>>
>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>
>>>> Hi,
>>>>
>>>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>>>
>>>> Yours,
>>>> Vlad
>>>>
>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>> wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>> signal using
>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>> the fit I
>>>>> get the following RuntimeWarning:
>>>>>
>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to
>>>>> linear
>>>>> dependence in the dictionary. The requested precision might not have
>>>>> been
>>>>> met.
>>>>>
>>>>> copy_X=copy_X, return_path=return_path)
>>>>>
>>>>> In those cases the results are indeed not satisfactory. I don't get the
>>>>> point of this warning as it is common in sparse coding to have an
>>>>> overcomplete dictionary an thus also linear dependency within it. That
>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>> if the
>>>>> dictionary is a square matrix.
>>>>>
>>>>> Might this Warning also point to other issues in the application?
>>>>>
>>>>>
>>>>> Thanks, Ben
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
From emanuela.boros at gmail.com Tue Feb 14 06:52:48 2017
From: emanuela.boros at gmail.com (Emanuela Boros)
Date: Tue, 14 Feb 2017 12:52:48 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To:
References:
Message-ID:
Just as a side point - which will not contribute to the purpose of this
discussion - you can use pyksvd for sparse coding also.
Emanuela Boros
LIMSI-CNRS
CDS/LAL-CNRS
Orsay, France
personal: 06 52 17 4595
work: 01 64 46 8954
emanuela.boros@{u-psud.fr,gmail.com}
boros@{limsi.fr,lal.in2p3.fr}
On Mon, Feb 13, 2017 at 10:55 AM, Benjamin Merkt <
benjamin.merkt at bcf.uni-freiburg.de> wrote:
> Hi everyone,
>
> I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal
> using a dictionary learned by a KSVD algorithm (pyksvd). However, during
> the fit I get the following RuntimeWarning:
>
> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to
> linear dependence in the dictionary. The requested precision might not have
> been met.
>
> copy_X=copy_X, return_path=return_path)
>
> In those cases the results are indeed not satisfactory. I don't get the
> point of this warning as it is common in sparse coding to have an
> overcomplete dictionary an thus also linear dependency within it. That
> should not be an issue for OMP. In fact, the warning is also raised if the
> dictionary is a square matrix.
>
> Might this Warning also point to other issues in the application?
>
>
> Thanks, Ben
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From pa at letnes.com Tue Feb 14 06:58:19 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 12:58:19 +0100
Subject: [scikit-learn] cross validation scores seem off for
PLSRegression
In-Reply-To:
References: <1487069667072.47907.95300@webmail1>
Message-ID: <1487073499094.130285.96242@webmail5>
Oh, and thanks for pointing out the bit about R^2 being negative - although it "feels off" in my head! Complex R?
-----------
Paul Anton
On 14 February 2017 at 12:08:11 +01:00, Fabian B?hnlein wrote:
> Hi Paul,
>
> not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x).
>
> Also see in the documentation why R^2 can be negative:
>
> Best,
> Fabian
>
>
> On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <> wrote:
>
> > Hi!
> >
> > Versions:
> > sklearn 0.18.1
> > numpy 1.11.3
> > Anaconda python 3.5 on ubuntu 16.04
> >
> > What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below.
> >
> > Cheers
> > Paul
> >
> > In [2]: import numpy as np
> >
> > In [3]: y = np.random.random((10, 3))
> >
> > In [4]: x = np.random.random((10, 17))
> >
> > In [5]: from sklearn.cross_decomposition import PLSRegression
> >
> > In [6]: pls = PLSRegression(n_components=3)
> >
> > In [7]: from sklearn.cross_validation import cross_val_score
> >
> > In [8]: from sklearn.model_selection import cross_val_score
> >
> > In [9]: cross_val_score(pls, x, y)
> > Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
> >
> >
> > PS:
> > This happens even if I cheat by setting y to the predicted value, and cross validate on that.
> >
> > In [29]: y = x @ pls.coef_
> >
> > In [30]: cross_val_score(pls, x, y)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
> >
> > In [34]: np.max(np.abs(y - x @ pls.coef_))
> > Out[34]: 0.0
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> >
> >
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bertrand.thirion at inria.fr Tue Feb 14 07:04:34 2017
From: bertrand.thirion at inria.fr (Bertrand Thirion)
Date: Tue, 14 Feb 2017 13:04:34 +0100 (CET)
Subject: [scikit-learn] cross validation scores seem off for
PLSRegression
In-Reply-To: <1487073499094.130285.96242@webmail5>
References: <1487069667072.47907.95300@webmail1>
<1487073499094.130285.96242@webmail5>
Message-ID: <1841132902.24047782.1487073874871.JavaMail.zimbra@inria.fr>
https://en.wikipedia.org/wiki/Coefficient_of_determination
"Important cases where the computational definition of R 2 can yield negative values, depending on the definition used, arise where the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data, and where linear regression is conducted without including an intercept."
Best,
Bertrand
----- Mail original -----
> De: "Paul Anton Letnes"
> ?: "Fabian B?hnlein"
> Cc: "Scikit-learn user and developer mailing list"
> Envoy?: Mardi 14 F?vrier 2017 12:58:19
> Objet: Re: [scikit-learn] cross validation scores seem off for PLSRegression
> Oh, and thanks for pointing out the bit about R^2 being negative - although
> it "feels off" in my head! Complex R?
> -----------
> Paul Anton
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From benjamin.merkt at bcf.uni-freiburg.de Tue Feb 14 07:34:51 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Tue, 14 Feb 2017 13:34:51 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To:
References:
<80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
<7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
Message-ID: <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
Yes, the data array y was already float64.
On 14.02.2017 12:28, Vlad Niculae wrote:
> One possible issue I can see causing this is if X and y have different
> dtypes... was this the case for you?
>
> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote:
>> Hi Ben,
>>
>> This actually sounds like a bug in this case! At a glance, the code
>> should use the correct BLAS calls for the data type you provide. Can
>> you reproduce this with a simple small example that gets different
>> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>>
>> Thanks,
>> Vlad
>>
>>
>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
>> wrote:
>>> OK, the issue is resolved. My dictionary was still in 32bit float from
>>> saving. When I convert it to 64float before calling fit it works fine.
>>>
>>> Sorry to bother.
>>>
>>>
>>>
>>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>>
>>>> Hi,
>>>>
>>>> I tried that with no effect. The fit still breaks after two iterations.
>>>>
>>>> If I set precompute=True I get three coefficients instead of only two.
>>>> My Dictionary is fairly large (currently 128x42000). Is it even feasible
>>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>>
>>>> -Ben
>>>>
>>>>
>>>>
>>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>>>>
>>>>> Yours,
>>>>> Vlad
>>>>>
>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>>> wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>>> signal using
>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>>> the fit I
>>>>>> get the following RuntimeWarning:
>>>>>>
>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to
>>>>>> linear
>>>>>> dependence in the dictionary. The requested precision might not have
>>>>>> been
>>>>>> met.
>>>>>>
>>>>>> copy_X=copy_X, return_path=return_path)
>>>>>>
>>>>>> In those cases the results are indeed not satisfactory. I don't get the
>>>>>> point of this warning as it is common in sparse coding to have an
>>>>>> overcomplete dictionary an thus also linear dependency within it. That
>>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>>> if the
>>>>>> dictionary is a square matrix.
>>>>>>
>>>>>> Might this Warning also point to other issues in the application?
>>>>>>
>>>>>>
>>>>>> Thanks, Ben
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
From pa at letnes.com Tue Feb 14 07:53:31 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 13:53:31 +0100
Subject: [scikit-learn] PLSRegression cross validates poorly when scaling
Message-ID: <1487076811458.24654.96908@webmail3>
Hi!
I've noticed that PLSRegression seems to cross validate incredibly poorly when scale=True. Could there be a bug here, or is there something I'm not getting this time, too? I noticed the very small (i.e. large negative) cross validation scores on a dataset that was far from unit variance; there, too, cross validation was extremely poor: around 0.4 in score when scaling was disabled, but (for example) -54422617.41005663 when scaling was enabled!
In [1]: import numpy as np
In [2]: from sklearn import cross_decomposition
In [3]: x = np.random.random((10,17))
In [4]: y = np.random.random((10, 3))
In [5]: pls = cross_decomposition.PLSRegression(scale=True)
In [6]: pls.fit(x,y)
Out[6]: PLSRegression(copy=True, max_iter=500, n_components=2, scale=True, tol=1e-06)
In [7]: from sklearn import model_selection
In [8]: model_selection.cross_val_score(pls, x, y)
Out[8]: array([-10.1680294 , -12.94229352, -13.39506559])
In [9]: pls = cross_decomposition.PLSRegression(scale=False)
In [10]: model_selection.cross_val_score(pls, x, y)
Out[10]: array([-0.5904095 , -1.16551493, -1.71555855])
Cheers
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From soumyodey at live.com Wed Feb 15 00:22:54 2017
From: soumyodey at live.com (Soumyo Dey)
Date: Wed, 15 Feb 2017 05:22:54 +0000
Subject: [scikit-learn] Need help to start contributing
Message-ID:
Hello,
I want to start contributing to the project, help me get started with an easyfix. I was able to setup git repository. Now I would like to start contributing with some code.
Thank you,
Soumyo Dey
Twitter : @SoumyoDey
Website: http://ace139.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From tom.duprelatour at orange.fr Wed Feb 15 09:47:30 2017
From: tom.duprelatour at orange.fr (Tom DLT)
Date: Wed, 15 Feb 2017 15:47:30 +0100
Subject: [scikit-learn] Need help to start contributing
In-Reply-To:
References:
Message-ID:
Welcome!
If you're looking to get started, you might try sorting issues by those
with "Needs contributor" and "easy" to begin with.
https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3AEasy+label%3A%22Need+Contributor%22
You should also check out the contributor guidelines:
http://scikit-learn.org/dev/developers/index.html
We look forward to seeing your contributions.
Tom
2017-02-15 6:22 GMT+01:00 Soumyo Dey :
> Hello,
>
>
> I want to start contributing to the project, help me get started with an
> easyfix. I was able to setup git repository. Now I would like to start
> contributing with some code.
>
>
> Thank you,
>
> Soumyo Dey
>
> Twitter : @SoumyoDey
>
> Website: http://ace139.com/
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Afarin.Famili at UTSouthwestern.edu Wed Feb 15 19:40:19 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Thu, 16 Feb 2017 00:40:19 +0000
Subject: [scikit-learn] A quick question regarding permutation_test_score
Message-ID: <1487205619542.10167@UTSouthwestern.edu>
Hi folks,
I have a question regarding how to use permutation_test_Score. Given data X (predictor) and Y (target), I hold aside 20% of my data for testing (Xtest and Ytest) and would then Perform hyperparameter-tuning on the rest (using Xtrain and Ytrain).
This way I can get the best parameters via RandomizedSearchCV. I now want to call permutation_test_score to compute the score, as well as the p-value of the model prediction. But the question is what X and Y should I send as input arguments to this function? I could send in X and Y but then my hyperparameter parameters were already tuned to Xtrain and Ytrain, which are a part of X and Y and that would bias the output values. Any help would be greatly appreciated.
Thanks,
Afarin
________________________________
UT Southwestern
Medical Center
The future of medicine, today.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From soumyodey at live.com Thu Feb 16 14:15:25 2017
From: soumyodey at live.com (Soumyo Dey)
Date: Thu, 16 Feb 2017 19:15:25 +0000
Subject: [scikit-learn] Need help to start contributing
In-Reply-To:
References: ,
Message-ID:
Hello,
Thank you Tom for the welcome. I would like to know, is it okay to work on the same bug which some other is already working on, or does the core devs/ mentors assign bugs to individuals?
Thank you,
Soumyo Dey
Twitter : @SoumyoDey
Website: http://ace139.com/
________________________________
From: scikit-learn on behalf of Tom DLT
Sent: Wednesday, February 15, 2017 8:17:30 PM
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] Need help to start contributing
Welcome!
If you're looking to get started, you might try sorting issues by those with "Needs contributor" and "easy" to begin with.
https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3AEasy+label%3A%22Need+Contributor%22
You should also check out the contributor guidelines:
http://scikit-learn.org/dev/developers/index.html
We look forward to seeing your contributions.
Tom
2017-02-15 6:22 GMT+01:00 Soumyo Dey >:
Hello,
I want to start contributing to the project, help me get started with an easyfix. I was able to setup git repository. Now I would like to start contributing with some code.
Thank you,
Soumyo Dey
Twitter : @SoumyoDey
Website: http://ace139.com/
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From olivier.grisel at ensta.org Thu Feb 16 15:58:01 2017
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Thu, 16 Feb 2017 21:58:01 +0100
Subject: [scikit-learn] Need help to start contributing
In-Reply-To:
References:
Message-ID:
It's ok to work on a bug if the original contributor has not replied
to the reviewers comments in a while (e.g. a couple of weeks).
--
Olivier
From benjamin.merkt at bcf.uni-freiburg.de Thu Feb 16 17:25:37 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Thu, 16 Feb 2017 23:25:37 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To: <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
References:
<80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
<7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
<66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
Message-ID: <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de>
Is this still considered a bug and therefore worth an issue?
On 14.02.2017 13:34, Benjamin Merkt wrote:
> Yes, the data array y was already float64.
>
>
> On 14.02.2017 12:28, Vlad Niculae wrote:
>> One possible issue I can see causing this is if X and y have different
>> dtypes... was this the case for you?
>>
>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote:
>>> Hi Ben,
>>>
>>> This actually sounds like a bug in this case! At a glance, the code
>>> should use the correct BLAS calls for the data type you provide. Can
>>> you reproduce this with a simple small example that gets different
>>> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>>>
>>> Thanks,
>>> Vlad
>>>
>>>
>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
>>> wrote:
>>>> OK, the issue is resolved. My dictionary was still in 32bit float from
>>>> saving. When I convert it to 64float before calling fit it works fine.
>>>>
>>>> Sorry to bother.
>>>>
>>>>
>>>>
>>>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I tried that with no effect. The fit still breaks after two
>>>>> iterations.
>>>>>
>>>>> If I set precompute=True I get three coefficients instead of only two.
>>>>> My Dictionary is fairly large (currently 128x42000). Is it even
>>>>> feasible
>>>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>>>
>>>>> -Ben
>>>>>
>>>>>
>>>>>
>>>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Are the columns of your matrix normalized? Try setting
>>>>>> `normalized=True`.
>>>>>>
>>>>>> Yours,
>>>>>> Vlad
>>>>>>
>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>>>> signal using
>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>>>> the fit I
>>>>>>> get the following RuntimeWarning:
>>>>>>>
>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>>>>
>>>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely
>>>>>>> due to
>>>>>>> linear
>>>>>>> dependence in the dictionary. The requested precision might not have
>>>>>>> been
>>>>>>> met.
>>>>>>>
>>>>>>> copy_X=copy_X, return_path=return_path)
>>>>>>>
>>>>>>> In those cases the results are indeed not satisfactory. I don't
>>>>>>> get the
>>>>>>> point of this warning as it is common in sparse coding to have an
>>>>>>> overcomplete dictionary an thus also linear dependency within it.
>>>>>>> That
>>>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>>>> if the
>>>>>>> dictionary is a square matrix.
>>>>>>>
>>>>>>> Might this Warning also point to other issues in the application?
>>>>>>>
>>>>>>>
>>>>>>> Thanks, Ben
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
From nelle.varoquaux at gmail.com Thu Feb 16 17:40:45 2017
From: nelle.varoquaux at gmail.com (Nelle Varoquaux)
Date: Thu, 16 Feb 2017 14:40:45 -0800
Subject: [scikit-learn] Announcing: Docathon, week of 6 March 2017
Message-ID:
Hi everyone,
I don't really think scikit-learn's documentation is lacking, but here is
an announcement for an event we are organizing called the "Docathon".
Several of us will be meeting up to sprint on documentation or
documentation-related projects at Berkeley, New York and Seattle.
If you are interested in joining us, either remotely or on campus, don't
hesitate to join!
Cheers,
Nelle
*What's a Docathon?*
It's a week-long sprint where we focus our efforts on improving the state
of documentation in the open-source and open-science world. This means
writing better documentation, building tools, and sharing skills.
*Who?s this for?*
Anyone who is interested in improving the understandability, accessibility,
and clarity of software! This might mean developers with a particular
project, or individuals who would like to contribute to a project. You
don?t need to use a specific language (though there will be many Python and
R developers) and you don?t need to be a core developer in order to help
out.
*Where can I sign up?*
Check out the *Docathon website* . You
can sign up as a *participant*
, *suggest
a project* to work on, or sign up *to
host your own* remote
Docathon wherever you like. You don?t have to use a specific language -
we?ll be as accommodating as possible!
*When is the Docathon?*
The Docathon will be held *March 6 through March 10*. For those coming to
BIDS at UC Berkeley, on the first day we'll have tutorials about
documentation and demos of documentation tools, followed by a few hours of
hacking. During the middle of the week, we'll set aside a few hours each
afternoon for hacking as a group at BIDS. On the last day, we'll have a
wrap-up event to show off what everybody worked on.
*Where will the Docathon take place?*
There are a *few docathons being held simultaneously*
, each with their own
schedule. At Berkeley we'll have a physical presence at BIDS over the week,
and we encourage you to show up for the hours we set aside for doc hacking.
However, it is totally fine to work remotely; we will coordinate people via
email/GitHub, too.
*Where can I get more information?*
Check out an updated schedule, list of tutorials, and more information at
our website here: *bids.github.io/docathon*
.
*Contact*
If you have any questions, open an issue on our *GitHub repo*
. We look forward to hearing from you!
Please feel free to forward this email to anyone who may be interested.
We'd love for other institutions/groups to get involved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From zephyr14 at gmail.com Thu Feb 16 19:56:54 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Fri, 17 Feb 2017 09:56:54 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
in the dictionary
In-Reply-To: <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de>
References:
<80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
<7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>