From Afarin.Famili at UTSouthwestern.edu  Fri Feb  3 15:53:54 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Fri, 3 Feb 2017 20:53:54 +0000
Subject: [scikit-learn] Calculate p-value,
 the measure of statistical significance, in scikit-learn
Message-ID: <1486155234925.50514@UTSouthwestern.edu>

Hi all,

I am aiming at calculating the p-value of regression models using scikit-learn, in order to report their statistical significance. Aside from permutation_test_score in scikit-learn, do you have any suggestions for calculating the p-value of the model? Ultimately, I am interested in computing the coefficient of determination, r2 as well as MSE to indicate the performance of the model for those models that were statistically significant.

Thank you,

Afarin?

?


________________________________

UT Southwestern


Medical Center


The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/3923ed4c/attachment.html>

From jakevdp at cs.washington.edu  Fri Feb  3 16:51:07 2017
From: jakevdp at cs.washington.edu (Jacob Vanderplas)
Date: Fri, 3 Feb 2017 13:51:07 -0800
Subject: [scikit-learn] Calculate p-value,
 the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID: <CACpqBg03ODURSSq4suHe7ngq5o2DQRpd1PA5-jfoUC+ZuHZNmA@mail.gmail.com>

Hi Afarin,
The short answer is no, you can't really compute p-values and related
statistics in Scikit-Learn.

This stems from a fundamental divide in statistics/AI between machine
learning on one hand, and statistical modeling on the other. A classic
treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo
Breiman.

In short, statistical modeling is about *estimating parameters of models*,
and in that context things like significance, p-values, etc. are relevant.
Machine learning is about *predicting outputs*, and generally treats models
and their parameters as a black box, the contents of which are not of any
explicit interest. As such, p-values and related statistics concerning
model parameters are not a concern.

Scikit-learn is firmly in the latter camp of Machine learning. Of course,
there is plenty of overlap between the two cultures, and the divide is
somewhat fuzzy in practice, but it's a useful way to frame the issue. If
you're interested in statistical modeling rather than machine learning (and
it sounds like you are), scikit-learn is not really the right tool. You
might check out the statsmodels <http://statsmodels.sourceforge.net/>
package,
   Jake

 Jake VanderPlas
 Senior Data Science Fellow
 Director of Research in Physical Sciences
 University of Washington eScience Institute

On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/904a0941/attachment.html>

From michael.eickenberg at gmail.com  Fri Feb  3 16:54:14 2017
From: michael.eickenberg at gmail.com (Michael Eickenberg)
Date: Fri, 3 Feb 2017 22:54:14 +0100
Subject: [scikit-learn] Calculate p-value,
 the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID: <CADxJN649N4L9AhCBOOmM9VrNr_X2HWF7LvLPT=gW5Nfi4YofcA@mail.gmail.com>

Dear Afarin,

scikit-learn is designed for predictive modelling, where evaluation is done
out of sample (using train and test sets).

You seem to be looking for a package with which you can do classical
in-sample statistics and their corresponding evaluations among which
p-values. You are probably better off using statsmodels for that or R
directly if you don't mind changing languages.

Hope that helps!
Michael

On Friday, 3 February 2017, Afarin Famili <Afarin.Famili at utsouthwestern.edu>
wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/f58f8837/attachment.html>

From stuart at stuartreynolds.net  Fri Feb  3 17:47:47 2017
From: stuart at stuartreynolds.net (Stuart Reynolds)
Date: Fri, 3 Feb 2017 14:47:47 -0800
Subject: [scikit-learn] Calculate p-value,
 the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID: <CAAy-kd==EAsXUdLBdSSBddWQBOiOZC_pPYCsOt9XYAedXUfy6g@mail.gmail.com>

The statsmodels package may have more of this kind of thing.

http://statsmodels.sourceforge.net/devel/glm.html
http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue

I assume you're talking about pvalues for a model's parameters, not on the
models performance.
For the latter, there's various basic stats functions.


On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/18a8f150/attachment-0001.html>

From Afarin.Famili at UTSouthwestern.edu  Fri Feb  3 18:32:23 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Fri, 3 Feb 2017 23:32:23 +0000
Subject: [scikit-learn] Does permutation_test_score not output the p_value
 for statistical significance of the model? Re: scikit-learn Digest, Vol 11,
 Issue 2
In-Reply-To: <mailman.4210.1486162069.2472.scikit-learn@python.org>
References: <mailman.4210.1486162069.2472.scikit-learn@python.org>
Message-ID: <1486164743283.49517@UTSouthwestern.edu>

Thank you all for your answers. I am interested in the statistical significance of the model and not the parameters of the model. I thought "permutation_test_score" from scikit-learn and the p_value it returns, work for the purpose of my work.  Am I wrong though? Is this function only used for measuring the statistical significance of classifiers and not regression models?

Kind regards,

Afarin


________________________________________
From: scikit-learn <scikit-learn-bounces+afarin.famili=utsouthwestern.edu at python.org> on behalf of scikit-learn-request at python.org <scikit-learn-request at python.org>
Sent: Friday, February 3, 2017 4:47 PM
To: scikit-learn at python.org
Subject: scikit-learn Digest, Vol 11, Issue 2

Send scikit-learn mailing list submissions to
        scikit-learn at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
        scikit-learn-request at python.org

You can reach the person managing the list at
        scikit-learn-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. Calculate p-value, the measure of statistical significance,
      in scikit-learn (Afarin Famili)
   2. Re: Calculate p-value, the measure of statistical
      significance, in scikit-learn (Jacob Vanderplas)
   3. Re: Calculate p-value, the measure of statistical
      significance, in scikit-learn (Michael Eickenberg)
   4. Re: Calculate p-value, the measure of statistical
      significance, in scikit-learn (Stuart Reynolds)


----------------------------------------------------------------------

Message: 1
Date: Fri, 3 Feb 2017 20:53:54 +0000
From: Afarin Famili <Afarin.Famili at UTSouthwestern.edu>
To: "scikit-learn at python.org" <scikit-learn at python.org>
Subject: [scikit-learn] Calculate p-value, the measure of statistical
        significance, in scikit-learn
Message-ID: <1486155234925.50514 at UTSouthwestern.edu>
Content-Type: text/plain; charset="iso-8859-1"

Hi all,

I am aiming at calculating the p-value of regression models using scikit-learn, in order to report their statistical significance. Aside from permutation_test_score in scikit-learn, do you have any suggestions for calculating the p-value of the model? Ultimately, I am interested in computing the coefficient of determination, r2 as well as MSE to indicate the performance of the model for those models that were statistically significant.

Thank you,

Afarin?

?


________________________________

UT Southwestern


Medical Center


The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/3923ed4c/attachment-0001.html>

------------------------------

Message: 2
Date: Fri, 3 Feb 2017 13:51:07 -0800
From: Jacob Vanderplas <jakevdp at cs.washington.edu>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] Calculate p-value, the measure of
        statistical significance, in scikit-learn
Message-ID:
        <CACpqBg03ODURSSq4suHe7ngq5o2DQRpd1PA5-jfoUC+ZuHZNmA at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Afarin,
The short answer is no, you can't really compute p-values and related
statistics in Scikit-Learn.

This stems from a fundamental divide in statistics/AI between machine
learning on one hand, and statistical modeling on the other. A classic
treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo
Breiman.

In short, statistical modeling is about *estimating parameters of models*,
and in that context things like significance, p-values, etc. are relevant.
Machine learning is about *predicting outputs*, and generally treats models
and their parameters as a black box, the contents of which are not of any
explicit interest. As such, p-values and related statistics concerning
model parameters are not a concern.

Scikit-learn is firmly in the latter camp of Machine learning. Of course,
there is plenty of overlap between the two cultures, and the divide is
somewhat fuzzy in practice, but it's a useful way to frame the issue. If
you're interested in statistical modeling rather than machine learning (and
it sounds like you are), scikit-learn is not really the right tool. You
might check out the statsmodels <http://statsmodels.sourceforge.net/>
package,
   Jake

 Jake VanderPlas
 Senior Data Science Fellow
 Director of Research in Physical Sciences
 University of Washington eScience Institute

On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/904a0941/attachment-0001.html>

------------------------------

Message: 3
Date: Fri, 3 Feb 2017 22:54:14 +0100
From: Michael Eickenberg <michael.eickenberg at gmail.com>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] Calculate p-value, the measure of
        statistical significance, in scikit-learn
Message-ID:
        <CADxJN649N4L9AhCBOOmM9VrNr_X2HWF7LvLPT=gW5Nfi4YofcA at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Afarin,

scikit-learn is designed for predictive modelling, where evaluation is done
out of sample (using train and test sets).

You seem to be looking for a package with which you can do classical
in-sample statistics and their corresponding evaluations among which
p-values. You are probably better off using statsmodels for that or R
directly if you don't mind changing languages.

Hope that helps!
Michael

On Friday, 3 February 2017, Afarin Famili <Afarin.Famili at utsouthwestern.edu>
wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/f58f8837/attachment-0001.html>

------------------------------

Message: 4
Date: Fri, 3 Feb 2017 14:47:47 -0800
From: Stuart Reynolds <stuart at stuartreynolds.net>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] Calculate p-value, the measure of
        statistical significance, in scikit-learn
Message-ID:
        <CAAy-kd==EAsXUdLBdSSBddWQBOiOZC_pPYCsOt9XYAedXUfy6g at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

The statsmodels package may have more of this kind of thing.

http://statsmodels.sourceforge.net/devel/glm.html
http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue

I assume you're talking about pvalues for a model's parameters, not on the
models performance.
For the latter, there's various basic stats functions.


On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/18a8f150/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn


------------------------------

End of scikit-learn Digest, Vol 11, Issue 2
*******************************************


From raga.markely at gmail.com  Fri Feb  3 23:18:39 2017
From: raga.markely at gmail.com (Raga Markely)
Date: Fri, 3 Feb 2017 23:18:39 -0500
Subject: [scikit-learn] Linear Discriminant Analysis - The priors do not sum
 to 1. Renormalizing"
Message-ID: <CAOLKFqsmpbPXs1aGJ1bx6VREw9t735EUzH=9nFNPFJwpreLjRg@mail.gmail.com>

Hello,

I ran LDA for dimensionality reduction, and got the following message on
the command prompt (not on the Jupyter Notebook):
"The priors do not sum to 1. Renormalizing", UserWarning

If I understand correctly, the prior = sum of y bincount/ len(y)? So, does
it mean I am getting this message due to some rounding errors? I wonder how
I can check if I make any mistake somewhere?

Thank you,
Raga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/7fc6d6d2/attachment.html>

From raga.markely at gmail.com  Fri Feb  3 23:36:50 2017
From: raga.markely at gmail.com (Raga Markely)
Date: Fri, 3 Feb 2017 23:36:50 -0500
Subject: [scikit-learn] PC Desktop requirement for Machine Learning
Message-ID: <CAOLKFqvQK0UxjwyAjOLDe7JF_fV6XfiBAU1xH6RxiX8EsScHYA@mail.gmail.com>

Hello,

I am planning to buy office PC desktop for machine learning work. I wonder
if you could provide some recommendation on the computer specs and brand? I
don't need cloud capacity, just a standalone, but powerful desktop.. to
simplify, let's ignore the price.. i can scale down according to budget as
appropriate later..

Just to give a rough ballpark, I ran repeated nested loop (50 outer repeats
x 50 inner repeats, ~35 data points, <10 features) with different
classification algorithms (Logistic Regressions, KNN, SVC, Kernel SVC,
Random Forest) on lightweight office laptop, and as expected, it took a
very long time to complete (it finished during the time I left overnight).
I would like to be able to complete this in a few mins or less maybe? :D..
so that I can quickly assess and modify the code as necessary .. In the
long run, I will also need to do regressions and may use larger data sets
(up to 10^4 data points order of magnitude)...

I guess this is a very vague question, but I will take any tips and
suggestions.

Thank you!
Raga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/a80cac38/attachment.html>

From ahowe42 at gmail.com  Sat Feb  4 03:23:33 2017
From: ahowe42 at gmail.com (Andrew Howe)
Date: Sat, 4 Feb 2017 11:23:33 +0300
Subject: [scikit-learn] Calculate p-value,
 the measure of statistical significance, in scikit-learn
In-Reply-To: <1486155234925.50514@UTSouthwestern.edu>
References: <1486155234925.50514@UTSouthwestern.edu>
Message-ID: <CANnYi3QbanMDDrnEUm6N51CggjCEt=mtDT7df4VU48UPespVLQ@mail.gmail.com>

I'm fairly certain that the scikit-learn regression result, plus what you
already have about the data is enough for you to compute all those
statistical measures yourself.  It should be rather trivial to do so.

Andrew

On Feb 4, 2017 00:34, "Afarin Famili" <Afarin.Famili at utsouthwestern.edu>
wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170204/b91b17ea/attachment-0001.html>

From alekhka at gmail.com  Sat Feb  4 07:45:54 2017
From: alekhka at gmail.com (Alekh Karkada Ashok)
Date: Sat, 4 Feb 2017 18:15:54 +0530
Subject: [scikit-learn] 10 years of Scikit-learn
Message-ID: <CAPTvW4Pt4L60U0JwXi5uYkkqqGd4vk6RCMwPZLiFLJf0DZfDSQ@mail.gmail.com>

Hi all!
2017 marks the 10th year of Scikit-learn (started as a GSoC project in
2007). Can we do anything to celebrate? Perhaps a sticker on the website?
or T-shirts commemorating this?

Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170204/cb03a50c/attachment.html>

From nelle.varoquaux at gmail.com  Sat Feb  4 14:52:05 2017
From: nelle.varoquaux at gmail.com (Nelle Varoquaux)
Date: Sat, 4 Feb 2017 11:52:05 -0800
Subject: [scikit-learn] Calculate p-value,
 the measure of statistical significance, in scikit-learn
In-Reply-To: <CANnYi3QbanMDDrnEUm6N51CggjCEt=mtDT7df4VU48UPespVLQ@mail.gmail.com>
References: <1486155234925.50514@UTSouthwestern.edu>
 <CANnYi3QbanMDDrnEUm6N51CggjCEt=mtDT7df4VU48UPespVLQ@mail.gmail.com>
Message-ID: <CAE-UAvQaLPZSEmosk6x_zSjN5bMPV4LU3jo16HwdbkL7X2crKQ@mail.gmail.com>

> I'm fairly certain that the scikit-learn regression result, plus what you
> already have about the data is enough for you to compute all those
> statistical measures yourself.  It should be rather trivial to do so.
>

That is highly dependent on the regression model you use. For example
computing a p-value for a lasso regression parameter is not so trivial,
though a significance test has recently been proposed.


>
> Andrew
>
> On Feb 4, 2017 00:34, "Afarin Famili" <Afarin.Famili at utsouthwestern.edu>
> wrote:
>
>> Hi all,
>>
>> I am aiming at calculating the p-value of regression models using
>> scikit-learn, in order to report their statistical significance. Aside from
>> permutation_test_score in scikit-learn, do you have any suggestions for
>> calculating the p-value of the model? Ultimately, I am interested in
>> computing the coefficient of determination, r2 as well as MSE to indicate
>> the performance of the model for those models that were statistically
>> significant.
>>
>> Thank you,
>>
>> Afarin?
>>
>> ?
>>
>>
>>
>> ------------------------------
>>
>> UT Southwestern
>>
>> Medical Center
>>
>> The future of medicine, today.
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170204/2bcbb32e/attachment.html>

From gael.varoquaux at normalesup.org  Sat Feb  4 16:39:47 2017
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sat, 4 Feb 2017 22:39:47 +0100
Subject: [scikit-learn] 10 years of Scikit-learn
In-Reply-To: <CAPTvW4Pt4L60U0JwXi5uYkkqqGd4vk6RCMwPZLiFLJf0DZfDSQ@mail.gmail.com>
References: <CAPTvW4Pt4L60U0JwXi5uYkkqqGd4vk6RCMwPZLiFLJf0DZfDSQ@mail.gmail.com>
Message-ID: <20170204213947.GE1858410@phare.normalesup.org>

Indeed, that a good point.

We should mention it in our talks, and maybe in the release notes of next
release.

Ga?l

On Sat, Feb 04, 2017 at 06:15:54PM +0530, Alekh Karkada Ashok wrote:
> Hi all!
> 2017 marks the 10th year of Scikit-learn (started as a GSoC project in 2007).
> Can we do anything to celebrate? Perhaps a sticker on the website? or T-shirts
> commemorating this?

> Thank you!

> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From Afarin.Famili at UTSouthwestern.edu  Sat Feb  4 18:43:36 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Sat, 4 Feb 2017 23:43:36 +0000
Subject: [scikit-learn] Permutation-test-score
Message-ID: <1486251816290.82720@UTSouthwestern.edu>

Hi,


Can anyone please tell me what  does "permutation_test_score"(and the p_value it returns) do in  scikit-learn? I am assuming it outputs the statistical significance of the performance of regression models. I am planning on comparing the performance of various regression models if the performance measure they are reporting is statistically significant. To this end, I wanna output the p-value of the prediction first, and if it was smaller than a certain cut-off, I would then report the performance metrics, such as r2 and MSE.

Do p-value and score outputs from "permutation-test-score" not provide me with what I want?

Afarin


________________________________

UT Southwestern


Medical Center


The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170204/58362f32/attachment.html>

From ahowe42 at gmail.com  Sun Feb  5 00:15:18 2017
From: ahowe42 at gmail.com (Andrew Howe)
Date: Sun, 5 Feb 2017 08:15:18 +0300
Subject: [scikit-learn] Calculate p-value,
 the measure of statistical significance, in scikit-learn
In-Reply-To: <CAE-UAvQaLPZSEmosk6x_zSjN5bMPV4LU3jo16HwdbkL7X2crKQ@mail.gmail.com>
References: <1486155234925.50514@UTSouthwestern.edu>
 <CANnYi3QbanMDDrnEUm6N51CggjCEt=mtDT7df4VU48UPespVLQ@mail.gmail.com>
 <CAE-UAvQaLPZSEmosk6x_zSjN5bMPV4LU3jo16HwdbkL7X2crKQ@mail.gmail.com>
Message-ID: <CANnYi3Rxey5z3hYSy2zJ2BCpUDBa6FXaR+zD28W=iTwcYKjt0g@mail.gmail.com>

Yep - in which case the OP would have difficulty computing p-values (but
not the other usual stats) with any software tool that provided those
methods.  But since the question was specifically about scikit-learn, my
main point is that the quantities are easy to compute (if they exist).

Andrew

<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
www.andrewhowe.com
http://www.linkedin.com/in/ahowe42
https://www.researchgate.net/profile/John_Howe12/
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>

On Sat, Feb 4, 2017 at 10:52 PM, Nelle Varoquaux <nelle.varoquaux at gmail.com>
wrote:

>
> I'm fairly certain that the scikit-learn regression result, plus what you
>> already have about the data is enough for you to compute all those
>> statistical measures yourself.  It should be rather trivial to do so.
>>
>
> That is highly dependent on the regression model you use. For example
> computing a p-value for a lasso regression parameter is not so trivial,
> though a significance test has recently been proposed.
>
>
>>
>> Andrew
>>
>> On Feb 4, 2017 00:34, "Afarin Famili" <Afarin.Famili at utsouthwestern.edu>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am aiming at calculating the p-value of regression models using
>>> scikit-learn, in order to report their statistical significance. Aside from
>>> permutation_test_score in scikit-learn, do you have any suggestions for
>>> calculating the p-value of the model? Ultimately, I am interested in
>>> computing the coefficient of determination, r2 as well as MSE to indicate
>>> the performance of the model for those models that were statistically
>>> significant.
>>>
>>> Thank you,
>>>
>>> Afarin?
>>>
>>> ?
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> UT Southwestern
>>>
>>> Medical Center
>>>
>>> The future of medicine, today.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170205/c4cc56af/attachment-0001.html>

From olivier.grisel at ensta.org  Sun Feb  5 04:44:01 2017
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Sun, 5 Feb 2017 10:44:01 +0100
Subject: [scikit-learn] Permutation-test-score
In-Reply-To: <1486251816290.82720@UTSouthwestern.edu>
References: <1486251816290.82720@UTSouthwestern.edu>
Message-ID: <CAFvE7K6y+KzY11bpQxkzku_RPh=6BxaAeWk0tD714WBWEutmzA@mail.gmail.com>

This is non-parametric (aka brute force) way to check that a model has a
predictive performance significantly higher than chance. For models with
90% accuracy this is useless as we already know for sure that the model is
better than predicting at random. This method is only useful if you have
very little data or very noisy data and you are not even sure that your
predictive method is able to pick anything predictive from the data. E.g.
you have a balanced binary classification problem with ~52% accuracy.

It proceeds as follows: it first does a single cross-validation round with
the true label to compute a reference score. Then it does the same 100
times but each time with independently randomly permuted variants of the
labels (the y array). Then it returns the fraction of the time the
reference CV score was higher than the CV scores of the models trained and
evaluated with permuted labels.

Here is an example:

http://scikit-learn.org/stable/auto_examples/feature_selection/plot_permutation_test_for_classification.html

Note that you should not use than method to select the best model from a
collection of possible models and then report its permutation test p-value
without correcting for multiple comparisons.

-- 
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170205/df537974/attachment.html>

From nixnmtm at gmail.com  Tue Feb  7 09:26:09 2017
From: nixnmtm at gmail.com (Nixon Raj)
Date: Tue, 7 Feb 2017 22:26:09 +0800
Subject: [scikit-learn] Need Corresponding indices array of values in each
 split of a DesicisionTreeClassifier
Message-ID: <CAGa5EEfUhy6jRd9P6x5==AwRMSYYom3rNUcNO_JO_bArXYct8Q@mail.gmail.com>

For Example, In the below decision tree dot file, I have 223 samples which
splits into [174, 49] in the first split and [110, 1] in the 2nd split

I would like to get the array of indices for the values of each split like

*[174, 49] and their corresponding indices (idx)  like [[0, 1 ,5,
7,....,200,221], [3, 4, 6, ....., 199,222,223]]*

*[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
[7]]*

Please help me

node [shape=box] ;
0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174,
49]"] ;
1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
1]"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
1 -> 2 ;
3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
1 -> 3 ;
4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
48]"] ;
0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
15]"] ;
4 -> 5 ;
6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
5 -> 6 ;
7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46, 6]"]
;
5 -> 7 ;
8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
7 -> 8 ;
9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28, 6]"]
;
7 -> 9 ;
10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
9 -> 10 ;
11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28, 5]"]
;
9 -> 11 ;
12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
11 -> 12 ;
13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
5]"] ;
11 -> 13 ;
14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
13 -> 14 ;
15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
13 -> 15 ;
16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
33]"] ;
4 -> 16 ;
17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
16 -> 17 ;
18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
31]"] ;
16 -> 18 ;
19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
31]"] ;
18 -> 19 ;
20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
26]"] ;
19 -> 20 ;
21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
19]"] ;
20 -> 21 ;
22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
21 -> 22 ;
23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
21 -> 23 ;
24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
20 -> 24 ;
25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
19 -> 25 ;
26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
18 -> 26 ;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170207/fd78a953/attachment.html>

From joel.nothman at gmail.com  Tue Feb  7 18:21:16 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Wed, 8 Feb 2017 10:21:16 +1100
Subject: [scikit-learn] Need Corresponding indices array of values in
 each split of a DesicisionTreeClassifier
In-Reply-To: <CAGa5EEfUhy6jRd9P6x5==AwRMSYYom3rNUcNO_JO_bArXYct8Q@mail.gmail.com>
References: <CAGa5EEfUhy6jRd9P6x5==AwRMSYYom3rNUcNO_JO_bArXYct8Q@mail.gmail.com>
Message-ID: <CAAkaFLVM-v7R+oqa1-HVwTgpc=ZS-ji3TtSv4Vsr1SmErmZHyg@mail.gmail.com>

I don't think putting that array of indices in a visualisation is a great
idea!

If you use my_tree.apply(X) you will be able to determine which leaf each
instance in X lands up at, and potentially trace up the tree from there.

On 8 February 2017 at 01:26, Nixon Raj <nixnmtm at gmail.com> wrote:

>
> For Example, In the below decision tree dot file, I have 223 samples which
> splits into [174, 49] in the first split and [110, 1] in the 2nd split
>
> I would like to get the array of indices for the values of each split like
>
> *[174, 49] and their corresponding indices (idx)  like [[0, 1 ,5,
> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]*
>
> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
> [7]]*
>
> Please help me
>
> node [shape=box] ;
> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174,
> 49]"] ;
> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
> 1]"] ;
> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
> 1 -> 2 ;
> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
> 1 -> 3 ;
> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
> 48]"] ;
> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
> 15]"] ;
> 4 -> 5 ;
> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
> 5 -> 6 ;
> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46,
> 6]"] ;
> 5 -> 7 ;
> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
> 7 -> 8 ;
> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28,
> 6]"] ;
> 7 -> 9 ;
> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
> 9 -> 10 ;
> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28,
> 5]"] ;
> 9 -> 11 ;
> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
> 11 -> 12 ;
> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
> 5]"] ;
> 11 -> 13 ;
> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
> 13 -> 14 ;
> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
> 13 -> 15 ;
> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
> 33]"] ;
> 4 -> 16 ;
> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
> 16 -> 17 ;
> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
> 31]"] ;
> 16 -> 18 ;
> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
> 31]"] ;
> 18 -> 19 ;
> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
> 26]"] ;
> 19 -> 20 ;
> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
> 19]"] ;
> 20 -> 21 ;
> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
> 21 -> 22 ;
> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
> 21 -> 23 ;
> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
> 20 -> 24 ;
> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
> 19 -> 25 ;
> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
> 18 -> 26 ;
> }
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170208/cec67caa/attachment.html>

From jblackburne at gmail.com  Tue Feb  7 19:13:40 2017
From: jblackburne at gmail.com (Jeff Blackburne)
Date: Tue, 7 Feb 2017 16:13:40 -0800
Subject: [scikit-learn] Need Corresponding indices array of values in
 each split of a DesicisionTreeClassifier
In-Reply-To: <CAAkaFLVM-v7R+oqa1-HVwTgpc=ZS-ji3TtSv4Vsr1SmErmZHyg@mail.gmail.com>
References: <CAGa5EEfUhy6jRd9P6x5==AwRMSYYom3rNUcNO_JO_bArXYct8Q@mail.gmail.com>
 <CAAkaFLVM-v7R+oqa1-HVwTgpc=ZS-ji3TtSv4Vsr1SmErmZHyg@mail.gmail.com>
Message-ID: <CALujU27GNg6a+qZTsT8i=mq=9F7tEBwuMEF6LgjLdAv1ydjGoQ@mail.gmail.com>

Nixon,

If you are using version 0.18 or later, you can reconstruct the information
you need using the `decision_path` method:

http://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html

-Jeff


On Tue, Feb 7, 2017 at 3:21 PM, Joel Nothman <joel.nothman at gmail.com> wrote:

> I don't think putting that array of indices in a visualisation is a great
> idea!
>
> If you use my_tree.apply(X) you will be able to determine which leaf each
> instance in X lands up at, and potentially trace up the tree from there.
>
> On 8 February 2017 at 01:26, Nixon Raj <nixnmtm at gmail.com> wrote:
>
>>
>> For Example, In the below decision tree dot file, I have 223 samples
>> which splits into [174, 49] in the first split and [110, 1] in the 2nd split
>>
>> I would like to get the array of indices for the values of each split
>> like
>>
>> *[174, 49] and their corresponding indices (idx)  like [[0, 1 ,5,
>> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]*
>>
>> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
>> [7]]*
>>
>> Please help me
>>
>> node [shape=box] ;
>> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174,
>> 49]"] ;
>> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
>> 1]"] ;
>> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
>> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
>> 1 -> 2 ;
>> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
>> 1 -> 3 ;
>> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
>> 48]"] ;
>> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
>> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
>> 15]"] ;
>> 4 -> 5 ;
>> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
>> 5 -> 6 ;
>> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46,
>> 6]"] ;
>> 5 -> 7 ;
>> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
>> 7 -> 8 ;
>> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28,
>> 6]"] ;
>> 7 -> 9 ;
>> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
>> 9 -> 10 ;
>> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28,
>> 5]"] ;
>> 9 -> 11 ;
>> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
>> 11 -> 12 ;
>> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
>> 5]"] ;
>> 11 -> 13 ;
>> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
>> 13 -> 14 ;
>> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
>> 13 -> 15 ;
>> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
>> 33]"] ;
>> 4 -> 16 ;
>> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
>> 16 -> 17 ;
>> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
>> 31]"] ;
>> 16 -> 18 ;
>> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
>> 31]"] ;
>> 18 -> 19 ;
>> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
>> 26]"] ;
>> 19 -> 20 ;
>> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
>> 19]"] ;
>> 20 -> 21 ;
>> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
>> 21 -> 22 ;
>> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
>> 21 -> 23 ;
>> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
>> 20 -> 24 ;
>> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
>> 19 -> 25 ;
>> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
>> 18 -> 26 ;
>> }
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170207/cb68d31d/attachment.html>

From joel.nothman at gmail.com  Tue Feb  7 21:00:12 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Tue, 7 Feb 2017 21:00:12 -0500
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To: <20170111215115.GO1585067@phare.normalesup.org>
References: <CAFvE7K5V43FBmxthqo4ntzxZwsYUUWXQxfN0Px5-GfziBK_5mQ@mail.gmail.com>
 <CAAkaFLWO5B2q5TLmc0CJ=wp1fzzC+jCNM2JUcxW+611SMz+zZQ@mail.gmail.com>
 <CAFvE7K5X9q2_33ndmHM6-Y0hAWbZqFBhm_jrRD4FboFq0TzDrA@mail.gmail.com>
 <CACmxyDEsDHdFwQy6aRE_w4eSg1FW+db3Vu0xZnZtkQxJU1q-5g@mail.gmail.com>
 <CAFvE7K4o1O3FsqDFdRSd+S1U4S5O_nJXMsUHSYJet6KqKKFF5w@mail.gmail.com>
 <20170109151546.GM2802991@phare.normalesup.org>
 <a9b93421-17b8-23ad-c910-9b6c80ba1a9e@gmail.com>
 <CAAkaFLXtRHHvN+nAV7MqxBkKeLuvnHicNM5LmRPUCZ+5UxH=MQ@mail.gmail.com>
 <20170111215115.GO1585067@phare.normalesup.org>
Message-ID: <CAAkaFLVHxmT4q+JD0Y8JRdphJbmaf4wywW62C+QOsMJCD9tghg@mail.gmail.com>

On 12 January 2017 at 08:51, Gael Varoquaux <gael.varoquaux at normalesup.org>
wrote:

> On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
> > When the two versions deprecation policy was instituted, releases were
> much
> > more frequent... Is that enough of an excuse?
>
> I'd rather say that we can here decide that we are giving a longer grace
> period.
>
> I think that slow deprecations are a good things (see titus's blog post
> here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )
>

Given that 0.18 was a very slow release, and the work for removing
deprecated material from 0.19 has already been done, I don't think we
should revert that. I agree that we can delay the deprecation deadline for
0.20 and 0.21.

In terms of release schedule, are we aiming for RC in early-mid March,
assuming Andy's above prognostications are correct and he is able to review
in a bigger way in a week or so?

J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170207/9be60cd0/attachment-0001.html>

From nixnmtm at gmail.com  Wed Feb  8 04:43:17 2017
From: nixnmtm at gmail.com (Nixon Raj)
Date: Wed, 8 Feb 2017 17:43:17 +0800
Subject: [scikit-learn] Need Corresponding indices array of values in
 each split of a DesicisionTreeClassifier
In-Reply-To: <CALujU27GNg6a+qZTsT8i=mq=9F7tEBwuMEF6LgjLdAv1ydjGoQ@mail.gmail.com>
References: <CAGa5EEfUhy6jRd9P6x5==AwRMSYYom3rNUcNO_JO_bArXYct8Q@mail.gmail.com>
 <CAAkaFLVM-v7R+oqa1-HVwTgpc=ZS-ji3TtSv4Vsr1SmErmZHyg@mail.gmail.com>
 <CALujU27GNg6a+qZTsT8i=mq=9F7tEBwuMEF6LgjLdAv1ydjGoQ@mail.gmail.com>
Message-ID: <CAGa5EEfP9bOmMXuVwt83Xxq8ooafAiacOuaE+jh7Z9zPEk_90Q@mail.gmail.com>

Hi Joel andJeff

Thanks for your valuable comment, i got that to work

On 8 February 2017 at 08:13, Jeff Blackburne <jblackburne at gmail.com> wrote:

> Nixon,
>
> If you are using version 0.18 or later, you can reconstruct the
> information you need using the `decision_path` method:
>
> http://scikit-learn.org/stable/auto_examples/tree/
> plot_unveil_tree_structure.html
>
> -Jeff
>
>
> On Tue, Feb 7, 2017 at 3:21 PM, Joel Nothman <joel.nothman at gmail.com>
> wrote:
>
>> I don't think putting that array of indices in a visualisation is a great
>> idea!
>>
>> If you use my_tree.apply(X) you will be able to determine which leaf each
>> instance in X lands up at, and potentially trace up the tree from there.
>>
>> On 8 February 2017 at 01:26, Nixon Raj <nixnmtm at gmail.com> wrote:
>>
>>>
>>> For Example, In the below decision tree dot file, I have 223 samples
>>> which splits into [174, 49] in the first split and [110, 1] in the 2nd split
>>>
>>> I would like to get the array of indices for the values of each split
>>> like
>>>
>>> *[174, 49] and their corresponding indices (idx)  like [[0, 1 ,5,
>>> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]*
>>>
>>> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221],
>>> [7]]*
>>>
>>> Please help me
>>>
>>> node [shape=box] ;
>>> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue =
>>> [174, 49]"] ;
>>> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110,
>>> 1]"] ;
>>> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
>>> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ;
>>> 1 -> 2 ;
>>> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ;
>>> 1 -> 3 ;
>>> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64,
>>> 48]"] ;
>>> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
>>> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48,
>>> 15]"] ;
>>> 4 -> 5 ;
>>> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ;
>>> 5 -> 6 ;
>>> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46,
>>> 6]"] ;
>>> 5 -> 7 ;
>>> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ;
>>> 7 -> 8 ;
>>> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28,
>>> 6]"] ;
>>> 7 -> 9 ;
>>> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ;
>>> 9 -> 10 ;
>>> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28,
>>> 5]"] ;
>>> 9 -> 11 ;
>>> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ;
>>> 11 -> 12 ;
>>> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16,
>>> 5]"] ;
>>> 11 -> 13 ;
>>> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ;
>>> 13 -> 14 ;
>>> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ;
>>> 13 -> 15 ;
>>> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16,
>>> 33]"] ;
>>> 4 -> 16 ;
>>> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ;
>>> 16 -> 17 ;
>>> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12,
>>> 31]"] ;
>>> 16 -> 18 ;
>>> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11,
>>> 31]"] ;
>>> 18 -> 19 ;
>>> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11,
>>> 26]"] ;
>>> 19 -> 20 ;
>>> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4,
>>> 19]"] ;
>>> 20 -> 21 ;
>>> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ;
>>> 21 -> 22 ;
>>> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ;
>>> 21 -> 23 ;
>>> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ;
>>> 20 -> 24 ;
>>> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ;
>>> 19 -> 25 ;
>>> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ;
>>> 18 -> 26 ;
>>> }
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Regards

Nixon Raj N
Department of Biological Science and Technology
Institute of Bioinformatics and Systems Biology
National Chiao Tung University
208 Lab Building 1, 75 Bo-Ai St.
Dong District, Hsinchu, Taiwan 30062
(R.O.C.)
Mob:+886-989353921
0ffice ext: 56997
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170208/0e0103c6/attachment.html>

From ahowe42 at gmail.com  Wed Feb  8 12:15:44 2017
From: ahowe42 at gmail.com (Andrew Howe)
Date: Wed, 8 Feb 2017 20:15:44 +0300
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To: <CANnYi3TCMgfbEwC3DHvKfZLaO5PnBuFOkK4STX+RbqfadNvpMw@mail.gmail.com>
References: <CAFvE7K5V43FBmxthqo4ntzxZwsYUUWXQxfN0Px5-GfziBK_5mQ@mail.gmail.com>
 <CAAkaFLWO5B2q5TLmc0CJ=wp1fzzC+jCNM2JUcxW+611SMz+zZQ@mail.gmail.com>
 <CAFvE7K5X9q2_33ndmHM6-Y0hAWbZqFBhm_jrRD4FboFq0TzDrA@mail.gmail.com>
 <CACmxyDEsDHdFwQy6aRE_w4eSg1FW+db3Vu0xZnZtkQxJU1q-5g@mail.gmail.com>
 <CAFvE7K4o1O3FsqDFdRSd+S1U4S5O_nJXMsUHSYJet6KqKKFF5w@mail.gmail.com>
 <20170109151546.GM2802991@phare.normalesup.org>
 <a9b93421-17b8-23ad-c910-9b6c80ba1a9e@gmail.com>
 <CAAkaFLXtRHHvN+nAV7MqxBkKeLuvnHicNM5LmRPUCZ+5UxH=MQ@mail.gmail.com>
 <20170111215115.GO1585067@phare.normalesup.org>
 <CANnYi3TCMgfbEwC3DHvKfZLaO5PnBuFOkK4STX+RbqfadNvpMw@mail.gmail.com>
Message-ID: <CANnYi3SaDg=+XqqTCtLQAXkdrJKqqejTvRCuizzk0Q0bFrRWEw@mail.gmail.com>

How many current deprecations are expected in the next release?

Andrew

On Jan 12, 2017 00:53, "Gael Varoquaux" <gael.varoquaux at normalesup.org>
wrote:

On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
> When the two versions deprecation policy was instituted, releases were
much
> more frequent... Is that enough of an excuse?

I'd rather say that we can here decide that we are giving a longer grace
period.

I think that slow deprecations are a good things (see titus's blog post
here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )

G

> On 12 January 2017 at 03:43, Andreas Mueller <t3kcit at gmail.com> wrote:


>     On 01/09/2017 10:15 AM, Gael Varoquaux wrote:

>             instead of setting up a roadmap I would rather just identify
bugs
>             that
>             are blockers and fix only those and don't wait for any feature
>             before
>             cutting 0.19.X.


>     I agree with the sentiment, but this would mess with our deprecation
cycle.
>     If we release now, and then release again soonish, that means people
have
>     less calendar time
>     to react to deprecations.

>     We could either accept this or change all deprecations and bump the
removal
>     by a version?

>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org
>     https://mail.python.org/mailman/listinfo/scikit-learn


> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


--
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170208/c9db72a3/attachment.html>

From joel.nothman at gmail.com  Wed Feb  8 22:30:40 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Thu, 9 Feb 2017 14:30:40 +1100
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To: <CANnYi3SaDg=+XqqTCtLQAXkdrJKqqejTvRCuizzk0Q0bFrRWEw@mail.gmail.com>
References: <CAFvE7K5V43FBmxthqo4ntzxZwsYUUWXQxfN0Px5-GfziBK_5mQ@mail.gmail.com>
 <CAAkaFLWO5B2q5TLmc0CJ=wp1fzzC+jCNM2JUcxW+611SMz+zZQ@mail.gmail.com>
 <CAFvE7K5X9q2_33ndmHM6-Y0hAWbZqFBhm_jrRD4FboFq0TzDrA@mail.gmail.com>
 <CACmxyDEsDHdFwQy6aRE_w4eSg1FW+db3Vu0xZnZtkQxJU1q-5g@mail.gmail.com>
 <CAFvE7K4o1O3FsqDFdRSd+S1U4S5O_nJXMsUHSYJet6KqKKFF5w@mail.gmail.com>
 <20170109151546.GM2802991@phare.normalesup.org>
 <a9b93421-17b8-23ad-c910-9b6c80ba1a9e@gmail.com>
 <CAAkaFLXtRHHvN+nAV7MqxBkKeLuvnHicNM5LmRPUCZ+5UxH=MQ@mail.gmail.com>
 <20170111215115.GO1585067@phare.normalesup.org>
 <CANnYi3TCMgfbEwC3DHvKfZLaO5PnBuFOkK4STX+RbqfadNvpMw@mail.gmail.com>
 <CANnYi3SaDg=+XqqTCtLQAXkdrJKqqejTvRCuizzk0Q0bFrRWEw@mail.gmail.com>
Message-ID: <CAAkaFLXiaXjkYvZs6+g_NHzOAur1Fsb_OTvwHGmOA-1rdtHFkw@mail.gmail.com>

Not sure that this quite gives you a number, but:


$git checkout 0.18.1
$ git grep -pwB1 0.19 sklearn | grep -ve ^- -e .csv: -e /tests/  >
/tmp/dep19.txt

etc.

edited results attached.


On 9 February 2017 at 04:15, Andrew Howe <ahowe42 at gmail.com> wrote:

> How many current deprecations are expected in the next release?
>
> Andrew
>
> On Jan 12, 2017 00:53, "Gael Varoquaux" <gael.varoquaux at normalesup.org>
> wrote:
>
> On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
> > When the two versions deprecation policy was instituted, releases were
> much
> > more frequent... Is that enough of an excuse?
>
> I'd rather say that we can here decide that we are giving a longer grace
> period.
>
> I think that slow deprecations are a good things (see titus's blog post
> here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )
>
> G
>
> > On 12 January 2017 at 03:43, Andreas Mueller <t3kcit at gmail.com> wrote:
>
>
>
> >     On 01/09/2017 10:15 AM, Gael Varoquaux wrote:
>
> >             instead of setting up a roadmap I would rather just identify
> bugs
> >             that
> >             are blockers and fix only those and don't wait for any
> feature
> >             before
> >             cutting 0.19.X.
>
>
>
> >     I agree with the sentiment, but this would mess with our deprecation
> cycle.
> >     If we release now, and then release again soonish, that means people
> have
> >     less calendar time
> >     to react to deprecations.
>
> >     We could either accept this or change all deprecations and bump the
> removal
> >     by a version?
>
> >     _______________________________________________
> >     scikit-learn mailing list
> >     scikit-learn at python.org
> >     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> --
>     Gael Varoquaux
>     Researcher, INRIA Parietal
>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>     Phone:  ++ 33-1-69-08-79-68
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170209/56742eb8/attachment-0001.html>
-------------- next part --------------
sklearn/base.py=from . import __version__
sklearn/base.py- at deprecated("ChangedBehaviorWarning has been moved into the sklearn.exceptions"
sklearn/base.py:            " module. It will not be available here from version 0.19")
sklearn/datasets/data/boston_house_prices.csv-1.62864,0,21.89,0,0.624,5.019,100,1.4394,4,437,21.2,396.9,34.41,14.4
sklearn/datasets/data/boston_house_prices.csv-0.40202,0,9.9,0,0.544,6.382,67.2,3.5325,4,304,18.4,395.21,10.36,23.1
sklearn/datasets/data/breast_cancer.csv-14.71,21.59,95.55,656.9,0.1137,0.1365,0.1293,0.08123,0.2027,0.06758,0.4226,1.15,2.735,40.09,0.003659,0.02855,0.02572,0.01272,0.01817,0.004108,17.87,30.7,115.7,985.5,0.1368,0.429,0.3587,0.1834,0.3698,0.1094,0
sklearn/datasets/data/breast_cancer.csv-20.26,23.03,132.4,1264,0.09078,0.1313,0.1465,0.08683,0.2095,0.05649,0.7576,1.509,4.554,87.87,0.006016,0.03482,0.04232,0.01269,0.02657,0.004411,24.22,31.59,156.1,1750,0.119,0.3539,0.4098,0.1573,0.3689,0.08368,0
sklearn/datasets/data/breast_cancer.csv-12.86,13.32,82.82,504.8,0.1134,0.08834,0.038,0.034,0.1543,0.06476,0.2212,1.042,1.614,16.57,0.00591,0.02016,0.01902,0.01011,0.01202,0.003107,14.04,21.08,92.8,599.5,0.1547,0.2231,0.1791,0.1155,0.2382,0.08553,1
sklearn/datasets/data/breast_cancer.csv-11.87,21.54,76.83,432,0.06613,0.1064,0.08777,0.02386,0.1349,0.06612,0.256,1.554,1.955,20.24,0.006854,0.06063,0.06663,0.01553,0.02354,0.008925,12.79,28.18,83.51,507.2,0.09457,0.3399,0.3218,0.0875,0.2305,0.09952,1
sklearn/datasets/data/breast_cancer.csv-13,25.13,82.61,520.2,0.08369,0.05073,0.01206,0.01762,0.1667,0.05449,0.2621,1.232,1.657,21.19,0.006054,0.008974,0.005681,0.006336,0.01215,0.001514,14.34,31.88,91.06,628.5,0.1218,0.1093,0.04462,0.05921,0.2306,0.06291,1
sklearn/datasets/lfw.py=def _fetch_lfw_pairs(index_file_path, data_folder_path, slice_=None,
sklearn/datasets/lfw.py- at deprecated("Function 'load_lfw_people' has been deprecated in 0.17 and will "
sklearn/datasets/lfw.py:            "be removed in 0.19."
sklearn/datasets/lfw.py=def load_lfw_people(download_if_missing=False, **kwargs):
sklearn/datasets/lfw.py-    .. deprecated:: 0.17
sklearn/datasets/lfw.py:        This function will be removed in 0.19.
sklearn/datasets/lfw.py=def fetch_lfw_pairs(subset='train', data_home=None, funneled=True, resize=0.5,
sklearn/datasets/lfw.py- at deprecated("Function 'load_lfw_pairs' has been deprecated in 0.17 and will "
sklearn/datasets/lfw.py:            "be removed in 0.19."
sklearn/datasets/lfw.py=def load_lfw_pairs(download_if_missing=False, **kwargs):
sklearn/datasets/lfw.py-    .. deprecated:: 0.17
sklearn/datasets/lfw.py:        This function will be removed in 0.19.
sklearn/decomposition/nmf.py=def non_negative_factorization(X, W=None, H=None, n_components=None,
sklearn/decomposition/nmf.py-    if solver == 'pg':
sklearn/decomposition/nmf.py:        warnings.warn("'pg' solver will be removed in release 0.19."
sklearn/decomposition/nmf.py=class NMF(BaseEstimator, TransformerMixin):
sklearn/decomposition/nmf.py-                          " for 'pg' solver, which will be removed"
sklearn/decomposition/nmf.py:                          " in release 0.19. Use another solver with L1 or L2"
sklearn/decomposition/nmf.py-
sklearn/decomposition/nmf.py:@deprecated("It will be removed in release 0.19. Use NMF instead."
sklearn/decomposition/nmf.py:            "'pg' solver is still available until release 0.19.")
sklearn/discriminant_analysis.py=class LinearDiscriminantAnalysis(BaseEstimator, LinearClassifierMixin,
sklearn/discriminant_analysis.py-            warnings.warn("The parameter 'store_covariance' is deprecated as "
sklearn/discriminant_analysis.py:                          "of version 0.17 and will be removed in 0.19. The "
sklearn/discriminant_analysis.py-            warnings.warn("The parameter 'tol' is deprecated as of version "
sklearn/discriminant_analysis.py:                          "0.17 and will be removed in 0.19. The parameter is "
sklearn/discriminant_analysis.py=class QuadraticDiscriminantAnalysis(BaseEstimator, ClassifierMixin):
sklearn/discriminant_analysis.py-            warnings.warn("The parameter 'store_covariances' is deprecated as "
sklearn/discriminant_analysis.py:                          "of version 0.17 and will be removed in 0.19. The "
sklearn/discriminant_analysis.py-            warnings.warn("The parameter 'tol' is deprecated as of version "
sklearn/discriminant_analysis.py:                          "0.17 and will be removed in 0.19. The parameter is "
sklearn/ensemble/forest.py=class ForestClassifier(six.with_metaclass(ABCMeta, BaseForest,
sklearn/ensemble/forest.py-                    warn("class_weight='subsample' is deprecated in 0.17 and"
sklearn/ensemble/forest.py:                         "will be removed in 0.19. It was replaced by "
sklearn/ensemble/gradient_boosting.py=class BaseGradientBoosting(six.with_metaclass(ABCMeta, BaseEnsemble,
sklearn/ensemble/gradient_boosting.py-
sklearn/ensemble/gradient_boosting.py:    @deprecated(" and will be removed in 0.19")
sklearn/ensemble/gradient_boosting.py-
sklearn/ensemble/gradient_boosting.py:    @deprecated(" and will be removed in 0.19")
sklearn/feature_selection/from_model.py=class _LearntSelectorMixin(TransformerMixin):
sklearn/feature_selection/from_model.py-    @deprecated('Support to use estimators as feature selectors will be '
sklearn/feature_selection/from_model.py:                'removed in version 0.19. Use SelectFromModel instead.')
sklearn/lda.py=warnings.warn("lda.LDA has been moved to "
sklearn/lda.py-              "discriminant_analysis.LinearDiscriminantAnalysis "
sklearn/lda.py:              "in 0.17 and will be removed in 0.19", DeprecationWarning)
sklearn/lda.py=class LDA(_LDA):
sklearn/lda.py-    .. deprecated:: 0.17
sklearn/lda.py:        This class will be removed in 0.19.
sklearn/linear_model/base.py=class LinearModel(six.with_metaclass(ABCMeta, BaseEstimator)):
sklearn/linear_model/base.py-
sklearn/linear_model/base.py:    @deprecated(" and will be removed in 0.19.")
sklearn/linear_model/base.py=class LinearRegression(LinearModel, RegressorMixin):
sklearn/linear_model/base.py-    @property
sklearn/linear_model/base.py:    @deprecated("``residues_`` is deprecated and will be removed in 0.19")
sklearn/linear_model/coordinate_descent.py=class ElasticNet(LinearModel, RegressorMixin):
sklearn/linear_model/coordinate_descent.py-
sklearn/linear_model/coordinate_descent.py:    @deprecated(" and will be removed in 0.19")
sklearn/linear_model/logistic.py=def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
sklearn/linear_model/logistic.py-        Whether or not to produce a copy of the data. A copy is not required
sklearn/linear_model/logistic.py:        anymore. This parameter is deprecated and will be removed in 0.19.
sklearn/linear_model/logistic.py-        warnings.warn("A copy is not required anymore. The 'copy' parameter "
sklearn/linear_model/logistic.py:                      "is deprecated and will be removed in 0.19.",
sklearn/linear_model/logistic.py-
sklearn/linear_model/logistic.py:        # 'auto' is deprecated and will be removed in 0.19
sklearn/linear_model/logistic.py=class LogisticRegressionCV(LogisticRegression, BaseEstimator,
sklearn/linear_model/logistic.py-                                class_weight in ['balanced', 'auto']):
sklearn/linear_model/logistic.py:            # 'auto' is deprecated and will be removed in 0.19
sklearn/linear_model/stochastic_gradient.py=class BaseSGDRegressor(BaseSGD, RegressorMixin):
sklearn/linear_model/stochastic_gradient.py-
sklearn/linear_model/stochastic_gradient.py:    @deprecated(" and will be removed in 0.19.")
sklearn/metrics/base.py=from ..utils import deprecated
sklearn/metrics/base.py- at deprecated("UndefinedMetricWarning has been moved into the sklearn.exceptions"
sklearn/metrics/base.py:            " module. It will not be available here from version 0.19")
sklearn/metrics/regression.py=def r2_score(y_true, y_pred,
sklearn/metrics/regression.py-        deprecated since version 0.17 and will be changed to 'uniform_average'
sklearn/metrics/regression.py:        starting from 0.19.
sklearn/metrics/regression.py-                      "0.17, it will be changed to 'uniform_average' "
sklearn/metrics/regression.py:                      "starting from 0.19.",
sklearn/multioutput.py=class MultiOutputRegressor(MultiOutputEstimator, RegressorMixin):
sklearn/multioutput.py-        """
sklearn/multioutput.py:        # XXX remove in 0.19 when r2_score default for multioutput changes
sklearn/pipeline.py=class Pipeline(_BasePipeline):
sklearn/pipeline.py-        if hasattr(X, 'ndim') and X.ndim == 1:
sklearn/pipeline.py:            warn("From version 0.19, a 1d X will not be reshaped in"
sklearn/preprocessing/data.py=DEPRECATION_MSG_1D = (
sklearn/preprocessing/data.py-    "Passing 1d arrays as data is deprecated in 0.17 and will "
sklearn/preprocessing/data.py:    "raise ValueError in 0.19. Reshape your data either using "
sklearn/preprocessing/data.py=class MinMaxScaler(BaseEstimator, TransformerMixin):
sklearn/preprocessing/data.py-    @deprecated("Attribute data_range will be removed in "
sklearn/preprocessing/data.py:                "0.19. Use ``data_range_`` instead")
sklearn/preprocessing/data.py-    @deprecated("Attribute data_min will be removed in "
sklearn/preprocessing/data.py:                "0.19. Use ``data_min_`` instead")
sklearn/preprocessing/data.py=class StandardScaler(BaseEstimator, TransformerMixin):
sklearn/preprocessing/data.py-    @property
sklearn/preprocessing/data.py:    @deprecated("Attribute ``std_`` will be removed in 0.19. "
sklearn/qda.py=warnings.warn("qda.QDA has been moved to "
sklearn/qda.py-              "discriminant_analysis.QuadraticDiscriminantAnalysis "
sklearn/qda.py:              "in 0.17 and will be removed in 0.19.", DeprecationWarning)
sklearn/qda.py=class QDA(_QDA):
sklearn/qda.py-    .. deprecated:: 0.17
sklearn/qda.py:        This class will be removed in 0.19.
sklearn/svm/base.py=class BaseLibSVM(six.with_metaclass(ABCMeta, BaseEstimator)):
sklearn/svm/base.py-
sklearn/svm/base.py:    @deprecated(" and will be removed in 0.19")
sklearn/svm/base.py=class BaseSVC(six.with_metaclass(ABCMeta, BaseLibSVM, ClassifierMixin)):
sklearn/svm/base.py-            warnings.warn("The decision_function_shape default value will "
sklearn/svm/base.py:                          "change from 'ovo' to 'ovr' in 0.19. This will change "
sklearn/svm/classes.py=class SVC(BaseSVC):
sklearn/svm/classes.py-        compatibility and raise a deprecation warning, but will change 'ovr'
sklearn/svm/classes.py:        in 0.19.
sklearn/svm/classes.py=class NuSVC(BaseSVC):
sklearn/svm/classes.py-        compatibility and raise a deprecation warning, but will change 'ovr'
sklearn/svm/classes.py:        in 0.19.
sklearn/utils/__init__.py=from ..exceptions import DataConversionWarning
sklearn/utils/__init__.py- at deprecated("ConvergenceWarning has been moved into the sklearn.exceptions "
sklearn/utils/__init__.py:            "module. It will not be available here from version 0.19")
sklearn/utils/class_weight.py=def compute_class_weight(class_weight, classes, y):
sklearn/utils/class_weight.py-                          "class_weight='balanced'. 'auto' will be removed in"
sklearn/utils/class_weight.py:                          " 0.19", DeprecationWarning)
sklearn/utils/estimator_checks.py=MULTI_OUTPUT = ['CCA', 'DecisionTreeRegressor', 'ElasticNet',
sklearn/utils/estimator_checks.py-
sklearn/utils/estimator_checks.py:# Estimators with deprecated transform methods. Should be removed in 0.19 when
sklearn/utils/testing.py=def if_not_mac_os(versions=('10.7', '10.8', '10.9'),
sklearn/utils/testing.py-    warnings.warn("if_not_mac_os is deprecated in 0.17 and will be removed"
sklearn/utils/testing.py:                  " in 0.19: use the safer and more generic"
sklearn/utils/validation.py=from ..exceptions import NotFittedError as _NotFittedError
sklearn/utils/validation.py- at deprecated("DataConversionWarning has been moved into the sklearn.exceptions"
sklearn/utils/validation.py:            " module. It will not be available here from version 0.19")
sklearn/utils/validation.py=class DataConversionWarning(_DataConversionWarning):
sklearn/utils/validation.py- at deprecated("NonBLASDotWarning has been moved into the sklearn.exceptions"
sklearn/utils/validation.py:            " module. It will not be available here from version 0.19")
sklearn/utils/validation.py=class NonBLASDotWarning(_NonBLASDotWarning):
sklearn/utils/validation.py- at deprecated("NotFittedError has been moved into the sklearn.exceptions module."
sklearn/utils/validation.py:            " It will not be available here from version 0.19")
sklearn/utils/validation.py=def check_array(array, accept_sparse=None, dtype="numeric", order=None,
sklearn/utils/validation.py-                    "Passing 1d arrays as data is deprecated in 0.17 and will "
sklearn/utils/validation.py:                    "raise ValueError in 0.19. Reshape your data either using "
sklearn/utils/validation.py=def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):
sklearn/utils/validation.py-    if not all_or_any([hasattr(estimator, attr) for attr in attributes]):
sklearn/utils/validation.py:        # FIXME NotFittedError_ --> NotFittedError in 0.19
-------------- next part --------------
sklearn/base.py=def clone(estimator, safe=True):
sklearn/base.py-                          " This behavior is deprecated as of 0.18 and "
sklearn/base.py:                          "support for this behavior will be removed in 0.20."
sklearn/cross_validation.py=warnings.warn("This module was deprecated in version 0.18 in favor of the "
sklearn/cross_validation.py-              "new CV iterators are different from that of this module. "
sklearn/cross_validation.py:              "This module will be removed in 0.20.", DeprecationWarning)
sklearn/cross_validation.py=class LeaveOneOut(_PartitionIterator):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class LeavePOut(_PartitionIterator):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class KFold(_BaseKFold):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class LabelKFold(_BaseKFold):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class StratifiedKFold(_BaseKFold):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class LeaveOneLabelOut(_PartitionIterator):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class LeavePLabelOut(_PartitionIterator):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class ShuffleSplit(BaseShuffleSplit):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class StratifiedShuffleSplit(BaseShuffleSplit):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class PredefinedSplit(_PartitionIterator):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=class LabelShuffleSplit(ShuffleSplit):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=def cross_val_predict(estimator, X, y=None, cv=None, n_jobs=1,
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=def check_cv(cv, X=None, y=None, classifier=False):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=def permutation_test_score(estimator, X, y, cv=None,
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/cross_validation.py=def train_test_split(*arrays, **options):
sklearn/cross_validation.py-    .. deprecated:: 0.18
sklearn/cross_validation.py:        This module will be removed in 0.20.
sklearn/decomposition/online_lda.py=class LatentDirichletAllocation(BaseEstimator, TransformerMixin):
sklearn/decomposition/online_lda.py-        faster than the batch update.
sklearn/decomposition/online_lda.py:        The default learning method is going to be changed to 'batch' in the 0.20 release.
sklearn/decomposition/online_lda.py-            warnings.warn("The default value for 'learning_method' will be "
sklearn/decomposition/online_lda.py:                          "changed from 'online' to 'batch' in the release 0.20. "
sklearn/decomposition/pca.py=class PCA(_BasePCA):
sklearn/decomposition/pca.py-
sklearn/decomposition/pca.py:@deprecated("RandomizedPCA was deprecated in 0.18 and will be removed in 0.20. "
sklearn/decomposition/pca.py=class RandomizedPCA(BaseEstimator, TransformerMixin):
sklearn/decomposition/pca.py-    .. deprecated:: 0.18
sklearn/decomposition/pca.py:        This class will be removed in 0.20.
sklearn/gaussian_process/gaussian_process.py=MACHINE_EPSILON = np.finfo(np.double).eps
sklearn/gaussian_process/gaussian_process.py- at deprecated("l1_cross_distances was deprecated in version 0.18 "
sklearn/gaussian_process/gaussian_process.py:            "and will be removed in 0.20.")
sklearn/gaussian_process/gaussian_process.py=def l1_cross_distances(X):
sklearn/gaussian_process/gaussian_process.py- at deprecated("GaussianProcess was deprecated in version 0.18 and will be "
sklearn/gaussian_process/gaussian_process.py:            "removed in 0.20. Use the GaussianProcessRegressor instead.")
sklearn/gaussian_process/gaussian_process.py=class GaussianProcess(BaseEstimator, RegressorMixin):
sklearn/gaussian_process/gaussian_process.py-    .. deprecated:: 0.18
sklearn/gaussian_process/gaussian_process.py:        This class will be removed in 0.20.
sklearn/grid_search.py=warnings.warn("This module was deprecated in version 0.18 in favor of the "
sklearn/grid_search.py-              "model_selection module into which all the refactored classes "
sklearn/grid_search.py:              "and functions are moved. This module will be removed in 0.20.",
sklearn/grid_search.py=class ParameterGrid(object):
sklearn/grid_search.py-    .. deprecated:: 0.18
sklearn/grid_search.py:        This module will be removed in 0.20.
sklearn/grid_search.py=class ParameterSampler(object):
sklearn/grid_search.py-    .. deprecated:: 0.18
sklearn/grid_search.py:        This module will be removed in 0.20.
sklearn/grid_search.py=def fit_grid_point(X, y, estimator, parameters, train, test, scorer,
sklearn/grid_search.py-    .. deprecated:: 0.18
sklearn/grid_search.py:        This module will be removed in 0.20.
sklearn/grid_search.py=class GridSearchCV(BaseSearchCV):
sklearn/grid_search.py-    .. deprecated:: 0.18
sklearn/grid_search.py:        This module will be removed in 0.20.
sklearn/grid_search.py=class RandomizedSearchCV(BaseSearchCV):
sklearn/grid_search.py-    .. deprecated:: 0.18
sklearn/grid_search.py:        This module will be removed in 0.20.
sklearn/isotonic.py=class IsotonicRegression(BaseEstimator, TransformerMixin, RegressorMixin):
sklearn/isotonic.py-    @deprecated("Attribute ``X_`` is deprecated in version 0.18 and will be"
sklearn/isotonic.py:                " removed in version 0.20.")
sklearn/isotonic.py-    @deprecated("Attribute ``y_`` is deprecated in version 0.18 and will"
sklearn/isotonic.py:                " be removed in version 0.20.")
sklearn/learning_curve.py=warnings.warn("This module was deprecated in version 0.18 in favor of the "
sklearn/learning_curve.py-              "model_selection module into which all the functions are moved."
sklearn/learning_curve.py:              " This module will be removed in 0.20",
sklearn/learning_curve.py=def learning_curve(estimator, X, y, train_sizes=np.linspace(0.1, 1.0, 5),
sklearn/learning_curve.py-    .. deprecated:: 0.18
sklearn/learning_curve.py:        This module will be removed in 0.20.
sklearn/learning_curve.py=def validation_curve(estimator, X, y, param_name, param_range, cv=None,
sklearn/learning_curve.py-    .. deprecated:: 0.18
sklearn/learning_curve.py:        This module will be removed in 0.20.
sklearn/linear_model/base.py=def make_dataset(X, y, sample_weight, random_state=None):
sklearn/linear_model/base.py- at deprecated("sparse_center_data was deprecated in version 0.18 and will be "
sklearn/linear_model/base.py:            "removed in 0.20. Use utilities in preprocessing.data instead")
sklearn/linear_model/base.py=def sparse_center_data(X, y, fit_intercept, normalize=False):
sklearn/linear_model/base.py- at deprecated("center_data was deprecated in version 0.18 and will be removed in "
sklearn/linear_model/base.py:            "0.20. Use utilities in preprocessing.data instead")
sklearn/linear_model/ransac.py=class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin):
sklearn/linear_model/ransac.py-
sklearn/linear_model/ransac.py:        NOTE: residual_metric is deprecated from 0.18 and will be removed in 0.20
sklearn/linear_model/ransac.py-                "'residual_metric' was deprecated in version 0.18 and "
sklearn/linear_model/ransac.py:                "will be removed in version 0.20. Use 'loss' instead.",
sklearn/linear_model/ransac.py-
sklearn/linear_model/ransac.py:            # XXX: Deprecation: Remove this if block in 0.20
sklearn/metrics/classification.py=def hamming_loss(y_true, y_pred, labels=None, sample_weight=None,
sklearn/metrics/classification.py-        (deprecated) Integer array of labels. This parameter has been
sklearn/metrics/classification.py:         renamed to ``labels`` in version 0.18 and will be removed in 0.20.
sklearn/metrics/classification.py-        warnings.warn("'classes' was renamed to 'labels' in version 0.18 and "
sklearn/metrics/classification.py:                      "will be removed in 0.20.", DeprecationWarning)
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method mean_squared_error was renamed to '
sklearn/metrics/scorer.py-                   'neg_mean_squared_error in version 0.18 and will '
sklearn/metrics/scorer.py:                   'be removed in 0.20.')
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method mean_absolute_error was renamed to '
sklearn/metrics/scorer.py-                   'neg_mean_absolute_error in version 0.18 and will '
sklearn/metrics/scorer.py:                   'be removed in 0.20.')
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method median_absolute_error was renamed to '
sklearn/metrics/scorer.py-                   'neg_median_absolute_error in version 0.18 and will '
sklearn/metrics/scorer.py:                   'be removed in 0.20.')
sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method log_loss was renamed to '
sklearn/metrics/scorer.py:                   'neg_log_loss in version 0.18 and will be removed in 0.20.')
sklearn/mixture/dpgmm.py=from __future__ import print_function
sklearn/mixture/dpgmm.py-
sklearn/mixture/dpgmm.py:# Important note for the deprecation cleaning of 0.20 :
sklearn/mixture/dpgmm.py=from .gmm import _GMMBase
sklearn/mixture/dpgmm.py- at deprecated("The function digamma is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py:            "will be removed in 0.20. Use scipy.special.digamma instead.")
sklearn/mixture/dpgmm.py=def digamma(x):
sklearn/mixture/dpgmm.py- at deprecated("The function gammaln is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py:            "will be removed in 0.20. Use scipy.special.gammaln instead.")
sklearn/mixture/dpgmm.py=def gammaln(x):
sklearn/mixture/dpgmm.py- at deprecated("The function log_normalize is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py:            "will be removed in 0.20.")
sklearn/mixture/dpgmm.py=def log_normalize(v, axis=0):
sklearn/mixture/dpgmm.py- at deprecated("The function wishart_log_det is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py:            "will be removed in 0.20.")
sklearn/mixture/dpgmm.py=def wishart_log_det(a, b, detB, n_features):
sklearn/mixture/dpgmm.py- at deprecated("The function wishart_logz is deprecated in 0.18 and "
sklearn/mixture/dpgmm.py:            "will be removed in 0.20.")
sklearn/mixture/dpgmm.py=class _DPGMMBase(_GMMBase):
sklearn/mixture/dpgmm.py-            "instead. DPGMM is deprecated in 0.18 and will be "
sklearn/mixture/dpgmm.py:            "removed in 0.20.")
sklearn/mixture/dpgmm.py=class DPGMM(_DPGMMBase):
sklearn/mixture/dpgmm.py-    .. deprecated:: 0.18
sklearn/mixture/dpgmm.py:        This class will be removed in 0.20.
sklearn/mixture/dpgmm.py-            "'dirichlet_distribution'` instead. "
sklearn/mixture/dpgmm.py:            "VBGMM is deprecated in 0.18 and will be removed in 0.20.")
sklearn/mixture/dpgmm.py=class VBGMM(_DPGMMBase):
sklearn/mixture/dpgmm.py-    .. deprecated:: 0.18
sklearn/mixture/dpgmm.py:        This class will be removed in 0.20.
sklearn/mixture/gmm.py=of Gaussian Mixture Models.
sklearn/mixture/gmm.py-
sklearn/mixture/gmm.py:# Important note for the deprecation cleaning of 0.20 :
sklearn/mixture/gmm.py=EPS = np.finfo(float).eps
sklearn/mixture/gmm.py- at deprecated("The function log_multivariate_normal_density is deprecated in 0.18"
sklearn/mixture/gmm.py:            " and will be removed in 0.20.")
sklearn/mixture/gmm.py=def log_multivariate_normal_density(X, means, covars, covariance_type='diag'):
sklearn/mixture/gmm.py- at deprecated("The function sample_gaussian is deprecated in 0.18"
sklearn/mixture/gmm.py:            " and will be removed in 0.20."
sklearn/mixture/gmm.py=class _GMMBase(BaseEstimator):
sklearn/mixture/gmm.py- at deprecated("The class GMM is deprecated in 0.18 and will be "
sklearn/mixture/gmm.py:            " removed in 0.20. Use class GaussianMixture instead.")
sklearn/mixture/gmm.py=class GMM(_GMMBase):
sklearn/mixture/gmm.py-    .. deprecated:: 0.18
sklearn/mixture/gmm.py:        This class will be removed in 0.20.
sklearn/mixture/gmm.py=def _validate_covars(covars, covariance_type, n_components):
sklearn/mixture/gmm.py- at deprecated("The functon distribute_covar_matrix_to_match_covariance_type"
sklearn/mixture/gmm.py:            "is deprecated in 0.18 and will be removed in 0.20.")
sklearn/model_selection/_search.py=def _check_param_grid(param_grid):
sklearn/model_selection/_search.py-
sklearn/model_selection/_search.py:# XXX Remove in 0.20
sklearn/model_selection/_search.py=class BaseSearchCV(six.with_metaclass(ABCMeta, BaseEstimator,
sklearn/model_selection/_search.py-            " in favor of the more elaborate cv_results_ attribute."
sklearn/model_selection/_search.py:            " The grid_scores_ attribute will not be available from 0.20",
sklearn/tree/_utils.pyx=cdef realloc_ptr safe_realloc(realloc_ptr* p, size_t nelems) except *:
sklearn/tree/_utils.pyx-    # sizeof(realloc_ptr[0]) would be more like idiomatic C, but causes Cython
sklearn/tree/_utils.pyx:    # 0.20.1 to crash.
sklearn/tree/export.py=def export_graphviz(decision_tree, out_file=SENTINEL, max_depth=None,
sklearn/tree/export.py-        Handle or name of the output file. If ``None``, the result is
sklearn/tree/export.py:        returned as a string. This will the default from version 0.20.
sklearn/tree/export.py-            warnings.warn("out_file can be set to None starting from 0.18. "
sklearn/tree/export.py:                          "This will be the default in 0.20.",
sklearn/utils/fast_dict.pyx=cdef class IntFloatDict:
sklearn/utils/fast_dict.pyx-
sklearn/utils/fast_dict.pyx:    # Cython 0.20 generates buggy code below. Commenting this out for now
-------------- next part --------------
sklearn/covariance/graph_lasso_.py=class GraphLassoCV(GraphLasso):
sklearn/covariance/graph_lasso_.py-    @deprecated("Attribute grid_scores was deprecated in version 0.19 and "
sklearn/covariance/graph_lasso_.py:                "will be removed in 0.21. Use 'grid_scores_' instead")
sklearn/datasets/data/boston_house_prices.csv-0.14455,12.5,7.87,0,0.524,6.172,96.1,5.9505,5,311,15.2,396.9,19.15,27.1
sklearn/datasets/data/boston_house_prices.csv-0.04684,0,3.41,0,0.489,6.417,66.1,3.0923,2,270,17.8,392.18,8.81,22.6
sklearn/datasets/data/boston_house_prices.csv-0.38735,0,25.65,0,0.581,5.613,95.6,1.7572,2,188,19.1,359.29,27.26,15.7
sklearn/datasets/data/breast_cancer.csv-15.12,16.68,98.78,716.6,0.08876,0.09588,0.0755,0.04079,0.1594,0.05986,0.2711,0.3621,1.974,26.44,0.005472,0.01919,0.02039,0.00826,0.01523,0.002881,17.77,20.24,117.7,989.5,0.1491,0.3331,0.3327,0.1252,0.3415,0.0974,0
sklearn/datasets/data/breast_cancer.csv-17.93,24.48,115.2,998.9,0.08855,0.07027,0.05699,0.04744,0.1538,0.0551,0.4212,1.433,2.765,45.81,0.005444,0.01169,0.01622,0.008522,0.01419,0.002751,20.92,34.69,135.1,1320,0.1315,0.1806,0.208,0.1136,0.2504,0.07948,0
sklearn/datasets/data/breast_cancer.csv-9,14.4,56.36,246.3,0.07005,0.03116,0.003681,0.003472,0.1788,0.06833,0.1746,1.305,1.144,9.789,0.007389,0.004883,0.003681,0.003472,0.02701,0.002153,9.699,20.07,60.9,285.5,0.09861,0.05232,0.01472,0.01389,0.2991,0.07804,1
sklearn/datasets/data/breast_cancer.csv-12.2,15.21,78.01,457.9,0.08673,0.06545,0.01994,0.01692,0.1638,0.06129,0.2575,0.8073,1.959,19.01,0.005403,0.01418,0.01051,0.005142,0.01333,0.002065,13.75,21.38,91.11,583.1,0.1256,0.1928,0.1167,0.05556,0.2661,0.07961,1
sklearn/decomposition/online_lda.py=class LatentDirichletAllocation(BaseEstimator, TransformerMixin):
sklearn/decomposition/online_lda.py-                          "be ignored as of 0.19. Support for this argument "
sklearn/decomposition/online_lda.py:                          "will be removed in 0.21.", DeprecationWarning)
sklearn/decomposition/sparse_pca.py=class SparsePCA(BaseEstimator, TransformerMixin):
sklearn/decomposition/sparse_pca.py-            .. deprecated:: 0.19
sklearn/decomposition/sparse_pca.py:               This parameter will be removed in 0.21.
sklearn/decomposition/sparse_pca.py-            warnings.warn("The ridge_alpha parameter on transform() is "
sklearn/decomposition/sparse_pca.py:                          "deprecated since 0.19 and will be removed in 0.21. "
sklearn/ensemble/gradient_boosting.py=class BaseGradientBoosting(six.with_metaclass(ABCMeta, BaseEnsemble)):
sklearn/ensemble/gradient_boosting.py-    @deprecated("Attribute n_features was deprecated in version 0.19 and "
sklearn/ensemble/gradient_boosting.py:                "will be removed in 0.21.")
sklearn/gaussian_process/gpr.py=class GaussianProcessRegressor(BaseEstimator, RegressorMixin):
sklearn/gaussian_process/gpr.py-    @deprecated("Attribute rng was deprecated in version 0.19 and "
sklearn/gaussian_process/gpr.py:                "will be removed in 0.21.")
sklearn/gaussian_process/gpr.py-    @deprecated("Attribute y_train_mean was deprecated in version 0.19 and "
sklearn/gaussian_process/gpr.py:                "will be removed in 0.21.")
sklearn/linear_model/stochastic_gradient.py=class BaseSGDClassifier(six.with_metaclass(ABCMeta, BaseSGD,
sklearn/linear_model/stochastic_gradient.py-    @deprecated("Attribute loss_function was deprecated in version 0.19 and "
sklearn/linear_model/stochastic_gradient.py:                "will be removed in 0.21. Use 'loss_function_' instead")
sklearn/manifold/t_sne.py=class TSNE(BaseEstimator):
sklearn/manifold/t_sne.py-    @deprecated("Attribute n_iter_final was deprecated in version 0.19 and "
sklearn/manifold/t_sne.py:                "will be removed in 0.21. Use 'n_iter_' instead")
sklearn/utils/validation.py=def check_array(array, accept_sparse=False, dtype="numeric", order=None,
sklearn/utils/validation.py-            "check_array and check_X_y is deprecated in version 0.19 "
sklearn/utils/validation.py:            "and will be removed in 0.21. Use 'accept_sparse=False' "

From joel.nothman at gmail.com  Wed Feb  8 22:39:20 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Thu, 9 Feb 2017 14:39:20 +1100
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To: <CAAkaFLXiaXjkYvZs6+g_NHzOAur1Fsb_OTvwHGmOA-1rdtHFkw@mail.gmail.com>
References: <CAFvE7K5V43FBmxthqo4ntzxZwsYUUWXQxfN0Px5-GfziBK_5mQ@mail.gmail.com>
 <CAAkaFLWO5B2q5TLmc0CJ=wp1fzzC+jCNM2JUcxW+611SMz+zZQ@mail.gmail.com>
 <CAFvE7K5X9q2_33ndmHM6-Y0hAWbZqFBhm_jrRD4FboFq0TzDrA@mail.gmail.com>
 <CACmxyDEsDHdFwQy6aRE_w4eSg1FW+db3Vu0xZnZtkQxJU1q-5g@mail.gmail.com>
 <CAFvE7K4o1O3FsqDFdRSd+S1U4S5O_nJXMsUHSYJet6KqKKFF5w@mail.gmail.com>
 <20170109151546.GM2802991@phare.normalesup.org>
 <a9b93421-17b8-23ad-c910-9b6c80ba1a9e@gmail.com>
 <CAAkaFLXtRHHvN+nAV7MqxBkKeLuvnHicNM5LmRPUCZ+5UxH=MQ@mail.gmail.com>
 <20170111215115.GO1585067@phare.normalesup.org>
 <CANnYi3TCMgfbEwC3DHvKfZLaO5PnBuFOkK4STX+RbqfadNvpMw@mail.gmail.com>
 <CANnYi3SaDg=+XqqTCtLQAXkdrJKqqejTvRCuizzk0Q0bFrRWEw@mail.gmail.com>
 <CAAkaFLXiaXjkYvZs6+g_NHzOAur1Fsb_OTvwHGmOA-1rdtHFkw@mail.gmail.com>
Message-ID: <CAAkaFLXph9p6sp0oVhkP4h50btnrUXGDPpJEXNZVUEWVLcke-g@mail.gmail.com>

See also
http://scikit-learn.org/stable/modules/classes.html#recently-deprecated

On 9 February 2017 at 14:30, Joel Nothman <joel.nothman at gmail.com> wrote:

> Not sure that this quite gives you a number, but:
>
>
> $git checkout 0.18.1
> $ git grep -pwB1 0.19 sklearn | grep -ve ^- -e .csv: -e /tests/  >
> /tmp/dep19.txt
>
> etc.
>
> edited results attached.
>
>
> On 9 February 2017 at 04:15, Andrew Howe <ahowe42 at gmail.com> wrote:
>
>> How many current deprecations are expected in the next release?
>>
>> Andrew
>>
>> On Jan 12, 2017 00:53, "Gael Varoquaux" <gael.varoquaux at normalesup.org>
>> wrote:
>>
>> On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
>> > When the two versions deprecation policy was instituted, releases were
>> much
>> > more frequent... Is that enough of an excuse?
>>
>> I'd rather say that we can here decide that we are giving a longer grace
>> period.
>>
>> I think that slow deprecations are a good things (see titus's blog post
>> here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html )
>>
>> G
>>
>> > On 12 January 2017 at 03:43, Andreas Mueller <t3kcit at gmail.com> wrote:
>>
>>
>>
>> >     On 01/09/2017 10:15 AM, Gael Varoquaux wrote:
>>
>> >             instead of setting up a roadmap I would rather just
>> identify bugs
>> >             that
>> >             are blockers and fix only those and don't wait for any
>> feature
>> >             before
>> >             cutting 0.19.X.
>>
>>
>>
>> >     I agree with the sentiment, but this would mess with our
>> deprecation cycle.
>> >     If we release now, and then release again soonish, that means
>> people have
>> >     less calendar time
>> >     to react to deprecations.
>>
>> >     We could either accept this or change all deprecations and bump the
>> removal
>> >     by a version?
>>
>> >     _______________________________________________
>> >     scikit-learn mailing list
>> >     scikit-learn at python.org
>> >     https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> --
>>     Gael Varoquaux
>>     Researcher, INRIA Parietal
>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>     Phone:  ++ 33-1-69-08-79-68
>>     http://gael-varoquaux.info            http://twitter.com/GaelVaroqua
>> ux
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170209/5e1fcd52/attachment.html>

From mmahesh.chandra873 at gmail.com  Sat Feb 11 09:18:50 2017
From: mmahesh.chandra873 at gmail.com (Mahesh Chandra)
Date: Sat, 11 Feb 2017 15:18:50 +0100
Subject: [scikit-learn] Logistic regression doesnt converge?
Message-ID: <CAH9GMgU9cET1YY=Wsqu0QETDcJmBCyU2X73c0UQR0Tm0jkqhsA@mail.gmail.com>

>reg = 0.1
lr = LogisticRegression(C=1/reg,max_iter=100,
fit_intercept=True,solver='lbfgs').fit(X_train, y_train)
ytrain_hat = lr.predict_proba(X_train)
loss = log_loss(y_train,ytrain_hat)
print loss
print loss + 0.5*reg*LA.norm(lr.coef_)

Maybe i am doing it wrong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170211/795d082f/attachment.html>

From mmahesh.chandra873 at gmail.com  Sat Feb 11 09:24:09 2017
From: mmahesh.chandra873 at gmail.com (Mahesh Chandra)
Date: Sat, 11 Feb 2017 15:24:09 +0100
Subject: [scikit-learn] Logistic regression doesnt converge?
In-Reply-To: <CAH9GMgU9cET1YY=Wsqu0QETDcJmBCyU2X73c0UQR0Tm0jkqhsA@mail.gmail.com>
References: <CAH9GMgU9cET1YY=Wsqu0QETDcJmBCyU2X73c0UQR0Tm0jkqhsA@mail.gmail.com>
Message-ID: <CAH9GMgUJrA_eb2zw0+UN139quZBMp+spvzQOFxfT21RkfM6pOg@mail.gmail.com>

Sorry for incomplete email.

Hi,

My question was that even after using many solvers, i dont get convergence
for Logistic regression. The loss  value as calculated in the previous
email was less for maxiter=10 than when maxiter = 30. So, does the
optimization method diverge and also how do we monitor and store the loss
(or any metric) after each iteration?

Thanks
Mahesh

On Sat, Feb 11, 2017 at 3:18 PM, Mahesh Chandra <
mmahesh.chandra873 at gmail.com> wrote:

> >reg = 0.1
> lr = LogisticRegression(C=1/reg,max_iter=100, fit_intercept=True,solver='lbfgs').fit(X_train,
> y_train)
> ytrain_hat = lr.predict_proba(X_train)
> loss = log_loss(y_train,ytrain_hat)
> print loss
> print loss + 0.5*reg*LA.norm(lr.coef_)
>
> Maybe i am doing it wrong
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170211/6c65c7ff/attachment.html>

From benjamin.merkt at bcf.uni-freiburg.de  Mon Feb 13 04:55:55 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Mon, 13 Feb 2017 10:55:55 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence in
 the dictionary
Message-ID: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>

Hi everyone,

I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal 
using a dictionary learned by a KSVD algorithm (pyksvd). However, during 
the fit I get the following RuntimeWarning:

/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: 
RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to 
linear dependence in the dictionary. The requested precision might not 
have been met.

   copy_X=copy_X, return_path=return_path)

In those cases the results are indeed not satisfactory. I don't get the 
point of this warning as it is common in sparse coding to have an 
overcomplete dictionary an thus also linear dependency within it. That 
should not be an issue for OMP. In fact, the warning is also raised if 
the dictionary is a square matrix.

Might this Warning also point to other issues in the application?


Thanks, Ben


From zephyr14 at gmail.com  Mon Feb 13 17:31:35 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Tue, 14 Feb 2017 07:31:35 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
Message-ID: <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>

Hi,

Are the columns of your matrix normalized? Try setting `normalized=True`.

Yours,
Vlad

On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
<benjamin.merkt at bcf.uni-freiburg.de> wrote:
> Hi everyone,
>
> I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal using
> a dictionary learned by a KSVD algorithm (pyksvd). However, during the fit I
> get the following RuntimeWarning:
>
> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
> RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to linear
> dependence in the dictionary. The requested precision might not have been
> met.
>
>   copy_X=copy_X, return_path=return_path)
>
> In those cases the results are indeed not satisfactory. I don't get the
> point of this warning as it is common in sparse coding to have an
> overcomplete dictionary an thus also linear dependency within it. That
> should not be an issue for OMP. In fact, the warning is also raised if the
> dictionary is a square matrix.
>
> Might this Warning also point to other issues in the application?
>
>
> Thanks, Ben
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From benjamin.merkt at bcf.uni-freiburg.de  Tue Feb 14 05:00:52 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Tue, 14 Feb 2017 11:00:52 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
Message-ID: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>

Hi,

I tried that with no effect. The fit still breaks after two iterations.

If I set precompute=True I get three coefficients instead of only two. 
My Dictionary is fairly large (currently 128x42000). Is it even feasible 
to use OMP with such a big Matrix (even with ~120GB ram)?

-Ben


On 13.02.2017 23:31, Vlad Niculae wrote:
> Hi,
>
> Are the columns of your matrix normalized? Try setting `normalized=True`.
>
> Yours,
> Vlad
>
> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>> Hi everyone,
>>
>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal using
>> a dictionary learned by a KSVD algorithm (pyksvd). However, during the fit I
>> get the following RuntimeWarning:
>>
>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to linear
>> dependence in the dictionary. The requested precision might not have been
>> met.
>>
>>   copy_X=copy_X, return_path=return_path)
>>
>> In those cases the results are indeed not satisfactory. I don't get the
>> point of this warning as it is common in sparse coding to have an
>> overcomplete dictionary an thus also linear dependency within it. That
>> should not be an issue for OMP. In fact, the warning is also raised if the
>> dictionary is a square matrix.
>>
>> Might this Warning also point to other issues in the application?
>>
>>
>> Thanks, Ben
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

From pa at letnes.com  Tue Feb 14 05:54:27 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 11:54:27 +0100
Subject: [scikit-learn] cross validation scores seem off for PLSRegression
Message-ID: <1487069667072.47907.95300@webmail1>

Hi!

Versions:
sklearn 0.18.1
numpy 1.11.3
Anaconda python 3.5 on ubuntu 16.04

What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below.

Cheers
Paul

In [2]: import numpy as np

In [3]: y = np.random.random((10, 3))

In [4]: x = np.random.random((10, 17))

In [5]: from sklearn.cross_decomposition import PLSRegression

In [6]: pls = PLSRegression(n_components=3)

In [7]: from sklearn.cross_validation import cross_val_score

In [8]: from sklearn.model_selection import cross_val_score

In [9]: cross_val_score(pls, x, y)
Out[9]: array([-32.52217837, -4.17228083, -5.88632365])


PS:
This happens even if I cheat by setting y to the predicted value, and cross validate on that.

In [29]: y = x @ pls.coef_

In [30]: cross_val_score(pls, x, y)
/home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5
warnings.warn('Y residual constant at iteration %s' % k)
/home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
warnings.warn('Y residual constant at iteration %s' % k)
/home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
warnings.warn('Y residual constant at iteration %s' % k)
Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])

In [34]: np.max(np.abs(y - x @ pls.coef_))
Out[34]: 0.0


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/9b545619/attachment.html>

From abdalrahman.eweiwi at gmail.com  Tue Feb 14 06:05:52 2017
From: abdalrahman.eweiwi at gmail.com (abdalrahman eweiwi)
Date: Tue, 14 Feb 2017 12:05:52 +0100
Subject: [scikit-learn] cross validation scores seem off for
 PLSRegression
In-Reply-To: <1487069667072.47907.95300@webmail1>
References: <1487069667072.47907.95300@webmail1>
Message-ID: <CAER+e1O5k=p=wzMaNmb2XVUeT5VO-LNRXr37ON=o2A+B0c++gA@mail.gmail.com>

Hi Paul,

PLSRegression in sklearn uses an iterative method to estimate the eigen
vectors and values (I think it is the power method) , which mostly varies
depending  on the underlying library that you use,
I would suggest to use SVD instead if you want to get stable results and
your dataset is small

I have wrote also wrote a Kernal PLS which you can find here
https://gist.github.com/aeweiwi/7788156


Cheers,


On Tue, Feb 14, 2017 at 11:54 AM, Paul Anton Letnes <pa at letnes.com> wrote:

> Hi!
>
> Versions:
> sklearn 0.18.1
> numpy 1.11.3
> Anaconda python 3.5 on ubuntu 16.04
>
> What range is the cross_val_score supposed to be in? I was under the
> impression from the documentation, although I cannot find it stated
> explicitly anywhere, that it should be a number in the range [0, 1].
> However, it appears that one can get large negative values; see the ipython
> session below.
>
> Cheers
> Paul
>
> In [2]: import numpy as np
>
> In [3]: y = np.random.random((10, 3))
>
> In [4]: x = np.random.random((10, 17))
>
> In [5]: from sklearn.cross_decomposition import PLSRegression
>
> In [6]: pls = PLSRegression(n_components=3)
>
> In [7]: from sklearn.cross_validation import cross_val_score
>
> In [8]: from sklearn.model_selection import cross_val_score
>
> In [9]: cross_val_score(pls, x, y)
> Out[9]: array([-32.52217837,  -4.17228083,  -5.88632365])
>
>
> PS:
> This happens even if I cheat by setting y to the predicted value, and
> cross validate on that.
>
> In [29]: y = x @ pls.coef_
>
> In [30]: cross_val_score(pls, x, y)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-
> packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual
> constant at iteration 5
>   warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-
> packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual
> constant at iteration 6
>   warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-
> packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual
> constant at iteration 6
>   warnings.warn('Y residual constant at iteration %s' % k)
> Out[30]: array([-35.01267353,  -4.94806383,  -5.9619526 ])
>
> In [34]: np.max(np.abs(y - x @ pls.coef_))
> Out[34]: 0.0
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/868a9d98/attachment-0001.html>

From fabian.boehnlein at gmail.com  Tue Feb 14 06:08:11 2017
From: fabian.boehnlein at gmail.com (=?UTF-8?Q?Fabian_B=C3=B6hnlein?=)
Date: Tue, 14 Feb 2017 11:08:11 +0000
Subject: [scikit-learn] cross validation scores seem off for
 PLSRegression
In-Reply-To: <1487069667072.47907.95300@webmail1>
References: <1487069667072.47907.95300@webmail1>
Message-ID: <CACe37cayXvJZEMmecdun7u9ok1YdsCtCAR3xms+_Z3y8ikmuNw@mail.gmail.com>

Hi Paul,

not sure what @ syntax does in ipython, but seems you're setting y to the
coefficients of the model instead of y_hat = pls.predict(x).

Also see in the documentation why R^2 can be negative:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cross_decomposition.PLSRegression.score

Best,
Fabian

On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <pa at letnes.com> wrote:

> Hi!
>
> Versions:
> sklearn 0.18.1
> numpy 1.11.3
> Anaconda python 3.5 on ubuntu 16.04
>
> What range is the cross_val_score supposed to be in? I was under the
> impression from the documentation, although I cannot find it stated
> explicitly anywhere, that it should be a number in the range [0, 1].
> However, it appears that one can get large negative values; see the ipython
> session below.
>
> Cheers
> Paul
>
> In [2]: import numpy as np
>
> In [3]: y = np.random.random((10, 3))
>
> In [4]: x = np.random.random((10, 17))
>
> In [5]: from sklearn.cross_decomposition import PLSRegression
>
> In [6]: pls = PLSRegression(n_components=3)
>
> In [7]: from sklearn.cross_validation import cross_val_score
>
> In [8]: from sklearn.model_selection import cross_val_score
>
> In [9]: cross_val_score(pls, x, y)
> Out[9]: array([-32.52217837,  -4.17228083,  -5.88632365])
>
>
> PS:
> This happens even if I cheat by setting y to the predicted value, and
> cross validate on that.
>
> In [29]: y = x @ pls.coef_
>
> In [30]: cross_val_score(pls, x, y)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293:
> UserWarning: Y residual constant at iteration 5
>   warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293:
> UserWarning: Y residual constant at iteration 6
>   warnings.warn('Y residual constant at iteration %s' % k)
> /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293:
> UserWarning: Y residual constant at iteration 6
>   warnings.warn('Y residual constant at iteration %s' % k)
> Out[30]: array([-35.01267353,  -4.94806383,  -5.9619526 ])
>
> In [34]: np.max(np.abs(y - x @ pls.coef_))
> Out[34]: 0.0
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/548188fe/attachment.html>

From benjamin.merkt at bcf.uni-freiburg.de  Tue Feb 14 06:19:42 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Tue, 14 Feb 2017 12:19:42 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
Message-ID: <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>

OK, the issue is resolved. My dictionary was still in 32bit float from 
saving. When I convert it to 64float before calling fit it works fine.

Sorry to bother.


On 14.02.2017 11:00, Benjamin Merkt wrote:
> Hi,
>
> I tried that with no effect. The fit still breaks after two iterations.
>
> If I set precompute=True I get three coefficients instead of only two.
> My Dictionary is fairly large (currently 128x42000). Is it even feasible
> to use OMP with such a big Matrix (even with ~120GB ram)?
>
> -Ben
>
>
>
> On 13.02.2017 23:31, Vlad Niculae wrote:
>> Hi,
>>
>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>
>> Yours,
>> Vlad
>>
>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>> Hi everyone,
>>>
>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>> signal using
>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>> the fit I
>>> get the following RuntimeWarning:
>>>
>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to
>>> linear
>>> dependence in the dictionary. The requested precision might not have
>>> been
>>> met.
>>>
>>>   copy_X=copy_X, return_path=return_path)
>>>
>>> In those cases the results are indeed not satisfactory. I don't get the
>>> point of this warning as it is common in sparse coding to have an
>>> overcomplete dictionary an thus also linear dependency within it. That
>>> should not be an issue for OMP. In fact, the warning is also raised
>>> if the
>>> dictionary is a square matrix.
>>>
>>> Might this Warning also point to other issues in the application?
>>>
>>>
>>> Thanks, Ben
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From zephyr14 at gmail.com  Tue Feb 14 06:26:07 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Tue, 14 Feb 2017 20:26:07 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
 <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
Message-ID: <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>

Hi Ben,

This actually sounds like a bug in this case! At a glance, the code
should use the correct BLAS calls for the data type you provide. Can
you reproduce this with a simple small example that gets different
results if the data is 32 vs 64 bit? Would you mind filing an issue?

Thanks,
Vlad


On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
<benjamin.merkt at bcf.uni-freiburg.de> wrote:
> OK, the issue is resolved. My dictionary was still in 32bit float from
> saving. When I convert it to 64float before calling fit it works fine.
>
> Sorry to bother.
>
>
>
> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>
>> Hi,
>>
>> I tried that with no effect. The fit still breaks after two iterations.
>>
>> If I set precompute=True I get three coefficients instead of only two.
>> My Dictionary is fairly large (currently 128x42000). Is it even feasible
>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>
>> -Ben
>>
>>
>>
>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>
>>> Hi,
>>>
>>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>>
>>> Yours,
>>> Vlad
>>>
>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>> signal using
>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>> the fit I
>>>> get the following RuntimeWarning:
>>>>
>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to
>>>> linear
>>>> dependence in the dictionary. The requested precision might not have
>>>> been
>>>> met.
>>>>
>>>>   copy_X=copy_X, return_path=return_path)
>>>>
>>>> In those cases the results are indeed not satisfactory. I don't get the
>>>> point of this warning as it is common in sparse coding to have an
>>>> overcomplete dictionary an thus also linear dependency within it. That
>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>> if the
>>>> dictionary is a square matrix.
>>>>
>>>> Might this Warning also point to other issues in the application?
>>>>
>>>>
>>>> Thanks, Ben
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From pa at letnes.com  Tue Feb 14 06:27:11 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 12:27:11 +0100
Subject: [scikit-learn] cross validation scores seem off for
 PLSRegression
In-Reply-To: <CACe37cayXvJZEMmecdun7u9ok1YdsCtCAR3xms+_Z3y8ikmuNw@mail.gmail.com>
References: <1487069667072.47907.95300@webmail1>
 <CACe37cayXvJZEMmecdun7u9ok1YdsCtCAR3xms+_Z3y8ikmuNw@mail.gmail.com>
Message-ID: <1487071631037.11717.96286@webmail8>

@ is a python operator meaning "matrix multiplication".

<https://www.python.org/dev/peps/pep-0465/>

I was deliberately setting y to the prediction to make sure that the PLS model should be able to recreate the values completely and give a sensible score.

Paul


On 14 February 2017 at 12:08:11 +01:00, Fabian B?hnlein <fabian.boehnlein at gmail.com> wrote:

> Hi Paul,
> 
> not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x).
> 
> Also see in the documentation why R^2 can be negative: <http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cross_decomposition.PLSRegression.score>
> 
> Best,
> Fabian
> 
> 
> On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <<pa at letnes.com>> wrote:
> 
> > Hi!
> > 
> > Versions:
> > sklearn 0.18.1
> > numpy 1.11.3
> > Anaconda python 3.5 on ubuntu 16.04
> > 
> > What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below.
> > 
> > Cheers
> > Paul
> > 
> > In [2]: import numpy as np
> > 
> > In [3]: y = np.random.random((10, 3))
> > 
> > In [4]: x = np.random.random((10, 17))
> > 
> > In [5]: from sklearn.cross_decomposition import PLSRegression
> > 
> > In [6]: pls = PLSRegression(n_components=3)
> > 
> > In [7]: from sklearn.cross_validation import cross_val_score
> > 
> > In [8]: from sklearn.model_selection import cross_val_score
> > 
> > In [9]: cross_val_score(pls, x, y)
> > Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
> > 
> > 
> > PS:
> > This happens even if I cheat by setting y to the predicted value, and cross validate on that.
> > 
> > In [29]: y = x @ pls.coef_
> > 
> > In [30]: cross_val_score(pls, x, y)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
> > 
> > In [34]: np.max(np.abs(y - x @ pls.coef_))
> > Out[34]: 0.0
> > 
> > 
> > _______________________________________________
> > scikit-learn mailing list
> > <scikit-learn at python.org>
> > 
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/d391bdcb/attachment-0001.html>

From zephyr14 at gmail.com  Tue Feb 14 06:28:08 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Tue, 14 Feb 2017 20:28:08 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
 <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
 <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>
Message-ID: <CAFJw_eGdddxxOZBaJYufbz04QwHVXPasteeau7eqDbwuJRCV7Q@mail.gmail.com>

One possible issue I can see causing this is if X and y have different
dtypes... was this the case for you?

On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae <zephyr14 at gmail.com> wrote:
> Hi Ben,
>
> This actually sounds like a bug in this case! At a glance, the code
> should use the correct BLAS calls for the data type you provide. Can
> you reproduce this with a simple small example that gets different
> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>
> Thanks,
> Vlad
>
>
> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>> OK, the issue is resolved. My dictionary was still in 32bit float from
>> saving. When I convert it to 64float before calling fit it works fine.
>>
>> Sorry to bother.
>>
>>
>>
>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>
>>> Hi,
>>>
>>> I tried that with no effect. The fit still breaks after two iterations.
>>>
>>> If I set precompute=True I get three coefficients instead of only two.
>>> My Dictionary is fairly large (currently 128x42000). Is it even feasible
>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>
>>> -Ben
>>>
>>>
>>>
>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>
>>>> Hi,
>>>>
>>>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>>>
>>>> Yours,
>>>> Vlad
>>>>
>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>> signal using
>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>> the fit I
>>>>> get the following RuntimeWarning:
>>>>>
>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to
>>>>> linear
>>>>> dependence in the dictionary. The requested precision might not have
>>>>> been
>>>>> met.
>>>>>
>>>>>   copy_X=copy_X, return_path=return_path)
>>>>>
>>>>> In those cases the results are indeed not satisfactory. I don't get the
>>>>> point of this warning as it is common in sparse coding to have an
>>>>> overcomplete dictionary an thus also linear dependency within it. That
>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>> if the
>>>>> dictionary is a square matrix.
>>>>>
>>>>> Might this Warning also point to other issues in the application?
>>>>>
>>>>>
>>>>> Thanks, Ben
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn

From emanuela.boros at gmail.com  Tue Feb 14 06:52:48 2017
From: emanuela.boros at gmail.com (Emanuela Boros)
Date: Tue, 14 Feb 2017 12:52:48 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
Message-ID: <CAF0zL+diZ-uobQ9i1QTYxU_aOgtRorCR9q4RQLJNRNoq4HboYQ@mail.gmail.com>

Just as a side point - which will not contribute to the purpose of this
discussion - you can use pyksvd for sparse coding also.

Emanuela Boros

LIMSI-CNRS
CDS/LAL-CNRS
Orsay, France

personal: 06 52 17 4595
work: 01 64 46 8954
emanuela.boros@{u-psud.fr,gmail.com}
boros@{limsi.fr,lal.in2p3.fr}

On Mon, Feb 13, 2017 at 10:55 AM, Benjamin Merkt <
benjamin.merkt at bcf.uni-freiburg.de> wrote:

> Hi everyone,
>
> I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal
> using a dictionary learned by a KSVD algorithm (pyksvd). However, during
> the fit I get the following RuntimeWarning:
>
> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
> RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to
> linear dependence in the dictionary. The requested precision might not have
> been met.
>
>   copy_X=copy_X, return_path=return_path)
>
> In those cases the results are indeed not satisfactory. I don't get the
> point of this warning as it is common in sparse coding to have an
> overcomplete dictionary an thus also linear dependency within it. That
> should not be an issue for OMP. In fact, the warning is also raised if the
> dictionary is a square matrix.
>
> Might this Warning also point to other issues in the application?
>
>
> Thanks, Ben
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/67d7a9f2/attachment.html>

From pa at letnes.com  Tue Feb 14 06:58:19 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 12:58:19 +0100
Subject: [scikit-learn] cross validation scores seem off for
 PLSRegression
In-Reply-To: <CACe37cayXvJZEMmecdun7u9ok1YdsCtCAR3xms+_Z3y8ikmuNw@mail.gmail.com>
References: <1487069667072.47907.95300@webmail1>
 <CACe37cayXvJZEMmecdun7u9ok1YdsCtCAR3xms+_Z3y8ikmuNw@mail.gmail.com>
Message-ID: <1487073499094.130285.96242@webmail5>

Oh, and thanks for pointing out the bit about R^2 being negative - although it "feels off" in my head! Complex R?

-----------
Paul Anton


On 14 February 2017 at 12:08:11 +01:00, Fabian B?hnlein <fabian.boehnlein at gmail.com> wrote:

> Hi Paul,
> 
> not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x).
> 
> Also see in the documentation why R^2 can be negative: <http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cross_decomposition.PLSRegression.score>
> 
> Best,
> Fabian
> 
> 
> On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <<pa at letnes.com>> wrote:
> 
> > Hi!
> > 
> > Versions:
> > sklearn 0.18.1
> > numpy 1.11.3
> > Anaconda python 3.5 on ubuntu 16.04
> > 
> > What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below.
> > 
> > Cheers
> > Paul
> > 
> > In [2]: import numpy as np
> > 
> > In [3]: y = np.random.random((10, 3))
> > 
> > In [4]: x = np.random.random((10, 17))
> > 
> > In [5]: from sklearn.cross_decomposition import PLSRegression
> > 
> > In [6]: pls = PLSRegression(n_components=3)
> > 
> > In [7]: from sklearn.cross_validation import cross_val_score
> > 
> > In [8]: from sklearn.model_selection import cross_val_score
> > 
> > In [9]: cross_val_score(pls, x, y)
> > Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
> > 
> > 
> > PS:
> > This happens even if I cheat by setting y to the predicted value, and cross validate on that.
> > 
> > In [29]: y = x @ pls.coef_
> > 
> > In [30]: cross_val_score(pls, x, y)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
> > 
> > In [34]: np.max(np.abs(y - x @ pls.coef_))
> > Out[34]: 0.0
> > 
> > 
> > _______________________________________________
> > scikit-learn mailing list
> > <scikit-learn at python.org>
> > 
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/2dd93f21/attachment-0001.html>

From bertrand.thirion at inria.fr  Tue Feb 14 07:04:34 2017
From: bertrand.thirion at inria.fr (Bertrand Thirion)
Date: Tue, 14 Feb 2017 13:04:34 +0100 (CET)
Subject: [scikit-learn] cross validation scores seem off for
 PLSRegression
In-Reply-To: <1487073499094.130285.96242@webmail5>
References: <1487069667072.47907.95300@webmail1>
 <CACe37cayXvJZEMmecdun7u9ok1YdsCtCAR3xms+_Z3y8ikmuNw@mail.gmail.com>
 <1487073499094.130285.96242@webmail5>
Message-ID: <1841132902.24047782.1487073874871.JavaMail.zimbra@inria.fr>

https://en.wikipedia.org/wiki/Coefficient_of_determination 

"Important cases where the computational definition of R 2 can yield negative values, depending on the definition used, arise where the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data, and where linear regression is conducted without including an intercept." 

Best, 

Bertrand 

----- Mail original -----

> De: "Paul Anton Letnes" <pa at letnes.com>
> ?: "Fabian B?hnlein" <fabian.boehnlein at gmail.com>
> Cc: "Scikit-learn user and developer mailing list" <scikit-learn at python.org>
> Envoy?: Mardi 14 F?vrier 2017 12:58:19
> Objet: Re: [scikit-learn] cross validation scores seem off for PLSRegression

> Oh, and thanks for pointing out the bit about R^2 being negative - although
> it "feels off" in my head! Complex R?

> -----------
> Paul Anton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/25e0ab74/attachment.html>

From benjamin.merkt at bcf.uni-freiburg.de  Tue Feb 14 07:34:51 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Tue, 14 Feb 2017 13:34:51 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <CAFJw_eGdddxxOZBaJYufbz04QwHVXPasteeau7eqDbwuJRCV7Q@mail.gmail.com>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
 <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
 <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>
 <CAFJw_eGdddxxOZBaJYufbz04QwHVXPasteeau7eqDbwuJRCV7Q@mail.gmail.com>
Message-ID: <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>

Yes, the data array y was already float64.


On 14.02.2017 12:28, Vlad Niculae wrote:
> One possible issue I can see causing this is if X and y have different
> dtypes... was this the case for you?
>
> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae <zephyr14 at gmail.com> wrote:
>> Hi Ben,
>>
>> This actually sounds like a bug in this case! At a glance, the code
>> should use the correct BLAS calls for the data type you provide. Can
>> you reproduce this with a simple small example that gets different
>> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>>
>> Thanks,
>> Vlad
>>
>>
>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>> OK, the issue is resolved. My dictionary was still in 32bit float from
>>> saving. When I convert it to 64float before calling fit it works fine.
>>>
>>> Sorry to bother.
>>>
>>>
>>>
>>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>>
>>>> Hi,
>>>>
>>>> I tried that with no effect. The fit still breaks after two iterations.
>>>>
>>>> If I set precompute=True I get three coefficients instead of only two.
>>>> My Dictionary is fairly large (currently 128x42000). Is it even feasible
>>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>>
>>>> -Ben
>>>>
>>>>
>>>>
>>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Are the columns of your matrix normalized? Try setting `normalized=True`.
>>>>>
>>>>> Yours,
>>>>> Vlad
>>>>>
>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>>> signal using
>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>>> the fit I
>>>>>> get the following RuntimeWarning:
>>>>>>
>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely due to
>>>>>> linear
>>>>>> dependence in the dictionary. The requested precision might not have
>>>>>> been
>>>>>> met.
>>>>>>
>>>>>>   copy_X=copy_X, return_path=return_path)
>>>>>>
>>>>>> In those cases the results are indeed not satisfactory. I don't get the
>>>>>> point of this warning as it is common in sparse coding to have an
>>>>>> overcomplete dictionary an thus also linear dependency within it. That
>>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>>> if the
>>>>>> dictionary is a square matrix.
>>>>>>
>>>>>> Might this Warning also point to other issues in the application?
>>>>>>
>>>>>>
>>>>>> Thanks, Ben
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

From pa at letnes.com  Tue Feb 14 07:53:31 2017
From: pa at letnes.com (Paul Anton Letnes)
Date: Tue, 14 Feb 2017 13:53:31 +0100
Subject: [scikit-learn] PLSRegression cross validates poorly when scaling
Message-ID: <1487076811458.24654.96908@webmail3>

Hi!

I've noticed that PLSRegression seems to cross validate incredibly poorly when scale=True. Could there be a bug here, or is there something I'm not getting this time, too? I noticed the very small (i.e. large negative) cross validation scores on a dataset that was far from unit variance; there, too, cross validation was extremely poor: around 0.4 in score when scaling was disabled, but (for example) -54422617.41005663 when scaling was enabled!

In [1]: import numpy as np

In [2]: from sklearn import cross_decomposition

In [3]: x = np.random.random((10,17))

In [4]: y = np.random.random((10, 3))

In [5]: pls = cross_decomposition.PLSRegression(scale=True)

In [6]: pls.fit(x,y)
Out[6]: PLSRegression(copy=True, max_iter=500, n_components=2, scale=True, tol=1e-06)

In [7]: from sklearn import model_selection

In [8]: model_selection.cross_val_score(pls, x, y)
Out[8]: array([-10.1680294 , -12.94229352, -13.39506559])

In [9]: pls = cross_decomposition.PLSRegression(scale=False)

In [10]: model_selection.cross_val_score(pls, x, y)
Out[10]: array([-0.5904095 , -1.16551493, -1.71555855])

Cheers
Paul

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/d0dbaaf7/attachment.html>

From soumyodey at live.com  Wed Feb 15 00:22:54 2017
From: soumyodey at live.com (Soumyo Dey)
Date: Wed, 15 Feb 2017 05:22:54 +0000
Subject: [scikit-learn] Need help to start contributing
Message-ID: <MA1PR01MB0405AD6E6FC0DF363656FD41C95B0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>

Hello,


I want to start contributing to the project, help me get started with an easyfix. I was able to setup git repository. Now I would like to start contributing with some code.


Thank you,

Soumyo Dey

Twitter : @SoumyoDey<https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/3?redirect=https%3A%2F%2Ftwitter.com%2FSoumyoDey>
Website: http://ace139.com/<https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/4?redirect=http%3A%2F%2Face139.github.io%2F>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170215/dd13f009/attachment.html>

From tom.duprelatour at orange.fr  Wed Feb 15 09:47:30 2017
From: tom.duprelatour at orange.fr (Tom DLT)
Date: Wed, 15 Feb 2017 15:47:30 +0100
Subject: [scikit-learn] Need help to start contributing
In-Reply-To: <MA1PR01MB0405AD6E6FC0DF363656FD41C95B0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>
References: <MA1PR01MB0405AD6E6FC0DF363656FD41C95B0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>
Message-ID: <CAGKmC=uX4L5N5-=serFNaOYo5+f_0frzhYZFORbPmUO5=gJq6w@mail.gmail.com>

Welcome!

If you're looking to get started, you might try sorting issues by those
with "Needs contributor" and "easy" to begin with.
https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3AEasy+label%3A%22Need+Contributor%22
You should also check out the contributor guidelines:
http://scikit-learn.org/dev/developers/index.html
We look forward to seeing your contributions.

Tom

2017-02-15 6:22 GMT+01:00 Soumyo Dey <soumyodey at live.com>:

> Hello,
>
>
> I want to start contributing to the project, help me get started with an
> easyfix. I was able to setup git repository. Now I would like to start
> contributing with some code.
>
>
> Thank you,
>
> Soumyo Dey
>
> Twitter : @SoumyoDey
> <https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/3?redirect=https%3A%2F%2Ftwitter.com%2FSoumyoDey>
> Website: http://ace139.com/
> <https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/4?redirect=http%3A%2F%2Face139.github.io%2F>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170215/17b55c26/attachment.html>

From Afarin.Famili at UTSouthwestern.edu  Wed Feb 15 19:40:19 2017
From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili)
Date: Thu, 16 Feb 2017 00:40:19 +0000
Subject: [scikit-learn] A quick question regarding permutation_test_score
Message-ID: <1487205619542.10167@UTSouthwestern.edu>

Hi folks,

I have a question regarding how to use permutation_test_Score. Given data X (predictor) and Y (target), I  hold aside 20% of my data for testing (Xtest and Ytest) and would then Perform hyperparameter-tuning  on the rest  (using Xtrain and Ytrain).
This way I can get the best parameters via  RandomizedSearchCV. I now want to call permutation_test_score to compute the score, as well as the p-value of the model prediction. But the question is what X and Y should I send as input arguments to this function? I could send in X and Y but then my hyperparameter parameters were already tuned to Xtrain and Ytrain, which are a part of X and Y and that would bias the output values. Any help would be greatly appreciated.

Thanks,
Afarin


________________________________

UT Southwestern


Medical Center


The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170216/692c215a/attachment.html>

From soumyodey at live.com  Thu Feb 16 14:15:25 2017
From: soumyodey at live.com (Soumyo Dey)
Date: Thu, 16 Feb 2017 19:15:25 +0000
Subject: [scikit-learn] Need help to start contributing
In-Reply-To: <CAGKmC=uX4L5N5-=serFNaOYo5+f_0frzhYZFORbPmUO5=gJq6w@mail.gmail.com>
References: <MA1PR01MB0405AD6E6FC0DF363656FD41C95B0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>,
 <CAGKmC=uX4L5N5-=serFNaOYo5+f_0frzhYZFORbPmUO5=gJq6w@mail.gmail.com>
Message-ID: <MA1PR01MB0405C47DEE544BB0AD762D55C95A0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>

Hello,


Thank you Tom for the welcome. I would like to know, is it okay to work on the same bug which some other is already working on, or does the core devs/ mentors assign bugs to individuals?


Thank you,

Soumyo Dey

Twitter : @SoumyoDey<https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/3?redirect=https%3A%2F%2Ftwitter.com%2FSoumyoDey>
Website: http://ace139.com/<https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/4?redirect=http%3A%2F%2Face139.github.io%2F>


________________________________
From: scikit-learn <scikit-learn-bounces+soumyodey=live.com at python.org> on behalf of Tom DLT <tom.duprelatour at orange.fr>
Sent: Wednesday, February 15, 2017 8:17:30 PM
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] Need help to start contributing

Welcome!

If you're looking to get started, you might try sorting issues by those with "Needs contributor" and "easy" to begin with.
https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3AEasy+label%3A%22Need+Contributor%22
You should also check out the contributor guidelines:
http://scikit-learn.org/dev/developers/index.html
We look forward to seeing your contributions.

Tom

2017-02-15 6:22 GMT+01:00 Soumyo Dey <soumyodey at live.com<mailto:soumyodey at live.com>>:

Hello,


I want to start contributing to the project, help me get started with an easyfix. I was able to setup git repository. Now I would like to start contributing with some code.


Thank you,

Soumyo Dey

Twitter : @SoumyoDey<https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/3?redirect=https%3A%2F%2Ftwitter.com%2FSoumyoDey>
Website: http://ace139.com/<https://link.nylas.com/link/z82y9ejfckg74qf8wv68racn/local-f41ee7f5-b444/4?redirect=http%3A%2F%2Face139.github.io%2F>


_______________________________________________
scikit-learn mailing list
scikit-learn at python.org<mailto:scikit-learn at python.org>
https://mail.python.org/mailman/listinfo/scikit-learn


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170216/7afb760a/attachment.html>

From olivier.grisel at ensta.org  Thu Feb 16 15:58:01 2017
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Thu, 16 Feb 2017 21:58:01 +0100
Subject: [scikit-learn] Need help to start contributing
In-Reply-To: <MA1PR01MB0405C47DEE544BB0AD762D55C95A0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>
References: <MA1PR01MB0405AD6E6FC0DF363656FD41C95B0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>
 <CAGKmC=uX4L5N5-=serFNaOYo5+f_0frzhYZFORbPmUO5=gJq6w@mail.gmail.com>
 <MA1PR01MB0405C47DEE544BB0AD762D55C95A0@MA1PR01MB0405.INDPRD01.PROD.OUTLOOK.COM>
Message-ID: <CAFvE7K6_E-wa8UQJ1oYZZyrMENkhyu7UXJD9c05QDpJ6D6yz+Q@mail.gmail.com>

It's ok to work on a bug if the original contributor has not replied
to the reviewers comments in a while (e.g. a couple of weeks).

-- 
Olivier

From benjamin.merkt at bcf.uni-freiburg.de  Thu Feb 16 17:25:37 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Thu, 16 Feb 2017 23:25:37 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
 <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
 <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>
 <CAFJw_eGdddxxOZBaJYufbz04QwHVXPasteeau7eqDbwuJRCV7Q@mail.gmail.com>
 <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
Message-ID: <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de>

Is this still considered a bug and therefore worth an issue?

On 14.02.2017 13:34, Benjamin Merkt wrote:
> Yes, the data array y was already float64.
>
>
> On 14.02.2017 12:28, Vlad Niculae wrote:
>> One possible issue I can see causing this is if X and y have different
>> dtypes... was this the case for you?
>>
>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae <zephyr14 at gmail.com> wrote:
>>> Hi Ben,
>>>
>>> This actually sounds like a bug in this case! At a glance, the code
>>> should use the correct BLAS calls for the data type you provide. Can
>>> you reproduce this with a simple small example that gets different
>>> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>>>
>>> Thanks,
>>> Vlad
>>>
>>>
>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>> OK, the issue is resolved. My dictionary was still in 32bit float from
>>>> saving. When I convert it to 64float before calling fit it works fine.
>>>>
>>>> Sorry to bother.
>>>>
>>>>
>>>>
>>>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I tried that with no effect. The fit still breaks after two
>>>>> iterations.
>>>>>
>>>>> If I set precompute=True I get three coefficients instead of only two.
>>>>> My Dictionary is fairly large (currently 128x42000). Is it even
>>>>> feasible
>>>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>>>
>>>>> -Ben
>>>>>
>>>>>
>>>>>
>>>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Are the columns of your matrix normalized? Try setting
>>>>>> `normalized=True`.
>>>>>>
>>>>>> Yours,
>>>>>> Vlad
>>>>>>
>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>>>> signal using
>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>>>> the fit I
>>>>>>> get the following RuntimeWarning:
>>>>>>>
>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>>>>
>>>>>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely
>>>>>>> due to
>>>>>>> linear
>>>>>>> dependence in the dictionary. The requested precision might not have
>>>>>>> been
>>>>>>> met.
>>>>>>>
>>>>>>>   copy_X=copy_X, return_path=return_path)
>>>>>>>
>>>>>>> In those cases the results are indeed not satisfactory. I don't
>>>>>>> get the
>>>>>>> point of this warning as it is common in sparse coding to have an
>>>>>>> overcomplete dictionary an thus also linear dependency within it.
>>>>>>> That
>>>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>>>> if the
>>>>>>> dictionary is a square matrix.
>>>>>>>
>>>>>>> Might this Warning also point to other issues in the application?
>>>>>>>
>>>>>>>
>>>>>>> Thanks, Ben
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From nelle.varoquaux at gmail.com  Thu Feb 16 17:40:45 2017
From: nelle.varoquaux at gmail.com (Nelle Varoquaux)
Date: Thu, 16 Feb 2017 14:40:45 -0800
Subject: [scikit-learn] Announcing: Docathon, week of 6 March 2017
Message-ID: <CAE-UAvTYoryRjkntO_UtE-cS2H8puE4997pk-UZGOq=ii3tO_Q@mail.gmail.com>

Hi everyone,

I don't really think scikit-learn's documentation is lacking, but here is
an announcement for an event we are organizing called the "Docathon".
Several of us will be meeting up to sprint on documentation or
documentation-related projects at Berkeley, New York and Seattle.

If you are interested in joining us, either remotely or on campus, don't
hesitate to join!

Cheers,
Nelle


*What's a Docathon?*
It's a week-long sprint where we focus our efforts on improving the state
of documentation in the open-source and open-science world. This means
writing better documentation, building tools, and sharing skills.

*Who?s this for?*
Anyone who is interested in improving the understandability, accessibility,
and clarity of software! This might mean developers with a particular
project, or individuals who would like to contribute to a project. You
don?t need to use a specific language (though there will be many Python and
R developers) and you don?t need to be a core developer in order to help
out.

*Where can I sign up?*
Check out the *Docathon website* <https://bids.github.io/docathon/>. You
can sign up as a *participant*
<https://goo.gl/forms/AaW2b24mMxOutxt02>, *suggest
a project* <https://goo.gl/forms/0cPpw01zehrEyDDE3> to work on, or sign up *to
host your own* <https://bids.github.io/docathon/pages/hosting.html> remote
Docathon wherever you like. You don?t have to use a specific language -
we?ll be as accommodating as possible!

*When is the Docathon?*
The Docathon will be held *March 6 through March 10*. For those coming to
BIDS at UC Berkeley, on the first day we'll have tutorials about
documentation and demos of documentation tools, followed by a few hours of
hacking. During the middle of the week, we'll set aside a few hours each
afternoon for hacking as a group at BIDS. On the last day, we'll have a
wrap-up event to show off what everybody worked on.

*Where will the Docathon take place?*
There are a *few docathons being held simultaneously*
<https://bids.github.io/docathon/pages/hosts.html>, each with their own
schedule. At Berkeley we'll have a physical presence at BIDS over the week,
and we encourage you to show up for the hours we set aside for doc hacking.
However, it is totally fine to work remotely; we will coordinate people via
email/GitHub, too.

*Where can I get more information?*
Check out an updated schedule, list of tutorials, and more information at
our website here: *bids.github.io/docathon* <http://bids.github.io/docathon>
.

*Contact*
If you have any questions, open an issue on our *GitHub repo*
<https://github.com/BIDS/docathon>. We look forward to hearing from you!

Please feel free to forward this email to anyone who may be interested.
We'd love for other institutions/groups to get involved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170216/3b7fc1a5/attachment-0001.html>

From zephyr14 at gmail.com  Thu Feb 16 19:56:54 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Fri, 17 Feb 2017 09:56:54 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
 <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
 <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>
 <CAFJw_eGdddxxOZBaJYufbz04QwHVXPasteeau7eqDbwuJRCV7Q@mail.gmail.com>
 <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
 <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de>
Message-ID: <CAFJw_eFP_R+xRjN2ezm5pBd2nFDx_unaS344wuAoGYkDmme+tg@mail.gmail.com>

I would consider this a bug. I'm not 100% sure what the conventions
for dtypes are. I'd appreciate it if you could open an issue, and even
better if you have a small reproducing example. I'll look into it this
weekend.

Vlad

On Fri, Feb 17, 2017 at 7:25 AM, Benjamin Merkt
<benjamin.merkt at bcf.uni-freiburg.de> wrote:
> Is this still considered a bug and therefore worth an issue?
>
>
> On 14.02.2017 13:34, Benjamin Merkt wrote:
>>
>> Yes, the data array y was already float64.
>>
>>
>> On 14.02.2017 12:28, Vlad Niculae wrote:
>>>
>>> One possible issue I can see causing this is if X and y have different
>>> dtypes... was this the case for you?
>>>
>>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae <zephyr14 at gmail.com> wrote:
>>>>
>>>> Hi Ben,
>>>>
>>>> This actually sounds like a bug in this case! At a glance, the code
>>>> should use the correct BLAS calls for the data type you provide. Can
>>>> you reproduce this with a simple small example that gets different
>>>> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>>>>
>>>> Thanks,
>>>> Vlad
>>>>
>>>>
>>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>
>>>>> OK, the issue is resolved. My dictionary was still in 32bit float from
>>>>> saving. When I convert it to 64float before calling fit it works fine.
>>>>>
>>>>> Sorry to bother.
>>>>>
>>>>>
>>>>>
>>>>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried that with no effect. The fit still breaks after two
>>>>>> iterations.
>>>>>>
>>>>>> If I set precompute=True I get three coefficients instead of only two.
>>>>>> My Dictionary is fairly large (currently 128x42000). Is it even
>>>>>> feasible
>>>>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>>>>
>>>>>> -Ben
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Are the columns of your matrix normalized? Try setting
>>>>>>> `normalized=True`.
>>>>>>>
>>>>>>> Yours,
>>>>>>> Vlad
>>>>>>>
>>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>>>>> signal using
>>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>>>>> the fit I
>>>>>>>> get the following RuntimeWarning:
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>>>>>
>>>>>>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely
>>>>>>>> due to
>>>>>>>> linear
>>>>>>>> dependence in the dictionary. The requested precision might not have
>>>>>>>> been
>>>>>>>> met.
>>>>>>>>
>>>>>>>>   copy_X=copy_X, return_path=return_path)
>>>>>>>>
>>>>>>>> In those cases the results are indeed not satisfactory. I don't
>>>>>>>> get the
>>>>>>>> point of this warning as it is common in sparse coding to have an
>>>>>>>> overcomplete dictionary an thus also linear dependency within it.
>>>>>>>> That
>>>>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>>>>> if the
>>>>>>>> dictionary is a square matrix.
>>>>>>>>
>>>>>>>> Might this Warning also point to other issues in the application?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, Ben
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From benjamin.merkt at bcf.uni-freiburg.de  Fri Feb 17 05:53:15 2017
From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt)
Date: Fri, 17 Feb 2017 11:53:15 +0100
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <CAFJw_eFP_R+xRjN2ezm5pBd2nFDx_unaS344wuAoGYkDmme+tg@mail.gmail.com>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
 <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
 <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>
 <CAFJw_eGdddxxOZBaJYufbz04QwHVXPasteeau7eqDbwuJRCV7Q@mail.gmail.com>
 <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
 <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de>
 <CAFJw_eFP_R+xRjN2ezm5pBd2nFDx_unaS344wuAoGYkDmme+tg@mail.gmail.com>
Message-ID: <f369a119-6486-536a-f6ea-7992ac4b06f8@bcf.uni-freiburg.de>

While trying to get a minimal example to reproduce the error I found 
that there it also occurred when both arrays where float64. However, I 
then realized that my data vector has fairly small values (~1e-4 to 
1e-8). If I normalize this as well it works for all combinations of 64 
and 32 bit.

-Ben


On 17.02.2017 01:56, Vlad Niculae wrote:
> I would consider this a bug. I'm not 100% sure what the conventions
> for dtypes are. I'd appreciate it if you could open an issue, and even
> better if you have a small reproducing example. I'll look into it this
> weekend.
>
> Vlad
>
> On Fri, Feb 17, 2017 at 7:25 AM, Benjamin Merkt
> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>> Is this still considered a bug and therefore worth an issue?
>>
>>
>> On 14.02.2017 13:34, Benjamin Merkt wrote:
>>>
>>> Yes, the data array y was already float64.
>>>
>>>
>>> On 14.02.2017 12:28, Vlad Niculae wrote:
>>>>
>>>> One possible issue I can see causing this is if X and y have different
>>>> dtypes... was this the case for you?
>>>>
>>>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae <zephyr14 at gmail.com> wrote:
>>>>>
>>>>> Hi Ben,
>>>>>
>>>>> This actually sounds like a bug in this case! At a glance, the code
>>>>> should use the correct BLAS calls for the data type you provide. Can
>>>>> you reproduce this with a simple small example that gets different
>>>>> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>>>>>
>>>>> Thanks,
>>>>> Vlad
>>>>>
>>>>>
>>>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
>>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>>
>>>>>> OK, the issue is resolved. My dictionary was still in 32bit float from
>>>>>> saving. When I convert it to 64float before calling fit it works fine.
>>>>>>
>>>>>> Sorry to bother.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I tried that with no effect. The fit still breaks after two
>>>>>>> iterations.
>>>>>>>
>>>>>>> If I set precompute=True I get three coefficients instead of only two.
>>>>>>> My Dictionary is fairly large (currently 128x42000). Is it even
>>>>>>> feasible
>>>>>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>>>>>
>>>>>>> -Ben
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Are the columns of your matrix normalized? Try setting
>>>>>>>> `normalized=True`.
>>>>>>>>
>>>>>>>> Yours,
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>>>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>>>>>> signal using
>>>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>>>>>> the fit I
>>>>>>>>> get the following RuntimeWarning:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>>>>>>
>>>>>>>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely
>>>>>>>>> due to
>>>>>>>>> linear
>>>>>>>>> dependence in the dictionary. The requested precision might not have
>>>>>>>>> been
>>>>>>>>> met.
>>>>>>>>>
>>>>>>>>>   copy_X=copy_X, return_path=return_path)
>>>>>>>>>
>>>>>>>>> In those cases the results are indeed not satisfactory. I don't
>>>>>>>>> get the
>>>>>>>>> point of this warning as it is common in sparse coding to have an
>>>>>>>>> overcomplete dictionary an thus also linear dependency within it.
>>>>>>>>> That
>>>>>>>>> should not be an issue for OMP. In fact, the warning is also raised
>>>>>>>>> if the
>>>>>>>>> dictionary is a square matrix.
>>>>>>>>>
>>>>>>>>> Might this Warning also point to other issues in the application?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks, Ben
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> scikit-learn mailing list
>>>>>>>>> scikit-learn at python.org
>>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

From ian at ianozsvald.com  Fri Feb 17 09:41:18 2017
From: ian at ianozsvald.com (Ian Ozsvald)
Date: Fri, 17 Feb 2017 14:41:18 +0000
Subject: [scikit-learn] ANN: PyDataLondon Conference in May - Call for
 Proposals closing in 1 week
Message-ID: <CAPvwANC4xQzQYuVMyBj5sBqSKCpMb5AfOQvDX74fuEys0o3Fhg@mail.gmail.com>

PyDataLondon 2017 runs in London this May 5-7th at Bloomberg's HQ near
London Bridge.

Our Call for Proposals is open until February 24th (next Friday), and
I'd love to see
sklearn talks and tutorial proposals:
http://pydata.org/london2017/

This is our 4th annual conference, we'll have 330 active data
scientists over the 3 days. Our conference builds on our 4,800+ member
meetup which runs every month at hedge fund AHL:
http://london.pydata.org/

I'd *love* to see a general sklearn tutorial at the conference,
there's a real demand for this here in London. I'm also very
interested in communicating
complex data visually, applications of data science that "made a
difference", data
engineering and all the topics you'd expect at a strong data science
conference. See
last year's schedule if you'd like an idea of what to expect:
http://pydata.org/london2016/schedule/

You may also be interested in PyDataAmsterdam (April 8-9th) and
PyDataBerlin (June 30th-
July 2nd), both have their CfP open at the moment:
http://pydata.org/amsterdam2017/
http://pydata.org/berlin2017/

I'm hoping to see some interesting sklearn submissions,
Ian (conference co-chair)
ps. At our monthly meetups I'm also asking members to think on
testimonials they could
provide back to the sklearn testimonials page, I think that'll be a
slow mission but
I'll keep pushing the message. Hopefully a few companies will reciprocate to
help with your grant applications

-- 
Ian Ozsvald (Data Scientist, PyDataLondon co-chair)
ian at IanOzsvald.com

http://IanOzsvald.com
http://ModelInsight.io
http://twitter.com/IanOzsvald

From akshay0724 at gmail.com  Fri Feb 17 13:04:29 2017
From: akshay0724 at gmail.com (Akshay Gupta)
Date: Fri, 17 Feb 2017 23:34:29 +0530
Subject: [scikit-learn] Google Summer of code 2017
Message-ID: <CAOwEpA4JAPVSA3WSgazXOLUh3KCCR8LWvDfOXVhJj2-qbNLAsQ@mail.gmail.com>

Are we having any plans to take part in GSOC this year?

If so then I would like to apply this year.
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170217/96133bc0/attachment.html>

From jeff1evesque at yahoo.com  Fri Feb 17 13:12:52 2017
From: jeff1evesque at yahoo.com (Jeffrey Levesque)
Date: Fri, 17 Feb 2017 13:12:52 -0500
Subject: [scikit-learn] Google Summer of code 2017
In-Reply-To: <CAOwEpA4JAPVSA3WSgazXOLUh3KCCR8LWvDfOXVhJj2-qbNLAsQ@mail.gmail.com>
References: <CAOwEpA4JAPVSA3WSgazXOLUh3KCCR8LWvDfOXVhJj2-qbNLAsQ@mail.gmail.com>
Message-ID: <AF33E18F-7965-4E60-BB3F-598E50AC66F8@yahoo.com>

My project has applied for the Google Summer of Code 2017:

- https://github.com/jeff1evesque/machine-learning

The project is intended to be an interface to the scikit-learn utilities.  This means a visualization HTML interface, as well as a programmatic interface (send post requests to the server). If anyone is interested in helping, let me know.


Thank you,

Jeff Levesque
https://github.com/jeff1evesque

> On Feb 17, 2017, at 1:04 PM, Akshay Gupta <akshay0724 at gmail.com> wrote:
> 
> Are we having any plans to take part in GSOC this year?
> 
> If so then I would like to apply this year.
> Thanks
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From max.linke88 at gmail.com  Fri Feb 17 13:21:46 2017
From: max.linke88 at gmail.com (Max Linke)
Date: Fri, 17 Feb 2017 19:21:46 +0100
Subject: [scikit-learn] Google Summer of code 2017
In-Reply-To: <AF33E18F-7965-4E60-BB3F-598E50AC66F8@yahoo.com>
References: <CAOwEpA4JAPVSA3WSgazXOLUh3KCCR8LWvDfOXVhJj2-qbNLAsQ@mail.gmail.com>
 <AF33E18F-7965-4E60-BB3F-598E50AC66F8@yahoo.com>
Message-ID: <2decef69-b2a3-c0a8-d2dc-53adeeae78c7@gmail.com>

You should check  GSoC-general at python.org. There have been questions 
about scikit-learn participation in GSoC this year.

best Max

On 02/17/2017 07:12 PM, Jeffrey Levesque via scikit-learn wrote:
> My project has applied for the Google Summer of Code 2017:
>
> - https://github.com/jeff1evesque/machine-learning
>
> The project is intended to be an interface to the scikit-learn utilities.  This means a visualization HTML interface, as well as a programmatic interface (send post requests to the server). If anyone is interested in helping, let me know.
>
>
> Thank you,
>
> Jeff Levesque
> https://github.com/jeff1evesque
>
>> On Feb 17, 2017, at 1:04 PM, Akshay Gupta <akshay0724 at gmail.com> wrote:
>>
>> Are we having any plans to take part in GSOC this year?
>>
>> If so then I would like to apply this year.
>> Thanks
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

From stuart at stuartreynolds.net  Fri Feb 17 14:06:39 2017
From: stuart at stuartreynolds.net (Stuart Reynolds)
Date: Fri, 17 Feb 2017 11:06:39 -0800
Subject: [scikit-learn] Modelling event rates
Message-ID: <CAAy-kd=O=0q9RxEApd0eBuQv9m1uY-3TgJxmGOMRF-pupsH5-g@mail.gmail.com>

Does scikit provide any event-rate/time-to-event models, or other models
that are specifically time-dependent? (e.g. models that output the # events
per unit of time)

Examples might include: Poisson model, or Cox proportional hazard.

There was some discussion about pulling from statsmodels,
  https://github.com/scikit-learn/scikit-learn/issues/5975
but (AFAIK), this was not done.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170217/a5e6723c/attachment.html>

From olivier.grisel at ensta.org  Fri Feb 17 14:18:05 2017
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Fri, 17 Feb 2017 20:18:05 +0100
Subject: [scikit-learn] Modelling event rates
In-Reply-To: <CAAy-kd=O=0q9RxEApd0eBuQv9m1uY-3TgJxmGOMRF-pupsH5-g@mail.gmail.com>
References: <CAAy-kd=O=0q9RxEApd0eBuQv9m1uY-3TgJxmGOMRF-pupsH5-g@mail.gmail.com>
Message-ID: <CAFvE7K4v_ODPGi9ik4qTJk-GbVMbkdwV+d7Twpz9jQMo1JPbOA@mail.gmail.com>

I don't think we have any model dedicated to this, but it's possible
that expressive non-parametricmodels such as RF and GBRT or richly
parameterized models such as MLP with a regression loss can do a good
enough job at giving you a point estimate.

-- 
Olivier

From zephyr14 at gmail.com  Fri Feb 17 20:01:32 2017
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Sat, 18 Feb 2017 10:01:32 +0900
Subject: [scikit-learn] OMP ended prematurely due to linear dependence
 in the dictionary
In-Reply-To: <f369a119-6486-536a-f6ea-7992ac4b06f8@bcf.uni-freiburg.de>
References: <ff44a68d-a2b6-d064-e643-7dab64d04b3b@bcf.uni-freiburg.de>
 <CAFJw_eGvwG6pTqR7jubHuCv3=zkTR07Vs3XaBELWKgtzJR-nVw@mail.gmail.com>
 <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de>
 <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de>
 <CAFJw_eE48i=P8WpFNiPm-0Otrz_QChoPpXgb8_mPaLvKpfqSxw@mail.gmail.com>
 <CAFJw_eGdddxxOZBaJYufbz04QwHVXPasteeau7eqDbwuJRCV7Q@mail.gmail.com>
 <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de>
 <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de>
 <CAFJw_eFP_R+xRjN2ezm5pBd2nFDx_unaS344wuAoGYkDmme+tg@mail.gmail.com>
 <f369a119-6486-536a-f6ea-7992ac4b06f8@bcf.uni-freiburg.de>
Message-ID: <CAFJw_eFindQuYVY9f4preEuyKc-faQPbFZhm4sin1JFuneakqw@mail.gmail.com>

Oh I'm inclined to say this isn't a bug then. Your residuals can
simply be low enough to trigger early stopping this way. Although I
agree the warning could be improved.

However, if it IS the case that plugging in 32bit X and 64bit y leads
to *different results* than if both have the same dtype (all other
things being equal) than that would be a bug. (even if the different
results don't consist in an unwanted early stopping.) Is this the
case?

On Fri, Feb 17, 2017 at 7:53 PM, Benjamin Merkt
<benjamin.merkt at bcf.uni-freiburg.de> wrote:
> While trying to get a minimal example to reproduce the error I found that
> there it also occurred when both arrays where float64. However, I then
> realized that my data vector has fairly small values (~1e-4 to 1e-8). If I
> normalize this as well it works for all combinations of 64 and 32 bit.
>
> -Ben
>
>
>
> On 17.02.2017 01:56, Vlad Niculae wrote:
>>
>> I would consider this a bug. I'm not 100% sure what the conventions
>> for dtypes are. I'd appreciate it if you could open an issue, and even
>> better if you have a small reproducing example. I'll look into it this
>> weekend.
>>
>> Vlad
>>
>> On Fri, Feb 17, 2017 at 7:25 AM, Benjamin Merkt
>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>
>>> Is this still considered a bug and therefore worth an issue?
>>>
>>>
>>> On 14.02.2017 13:34, Benjamin Merkt wrote:
>>>>
>>>>
>>>> Yes, the data array y was already float64.
>>>>
>>>>
>>>> On 14.02.2017 12:28, Vlad Niculae wrote:
>>>>>
>>>>>
>>>>> One possible issue I can see causing this is if X and y have different
>>>>> dtypes... was this the case for you?
>>>>>
>>>>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae <zephyr14 at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Ben,
>>>>>>
>>>>>> This actually sounds like a bug in this case! At a glance, the code
>>>>>> should use the correct BLAS calls for the data type you provide. Can
>>>>>> you reproduce this with a simple small example that gets different
>>>>>> results if the data is 32 vs 64 bit? Would you mind filing an issue?
>>>>>>
>>>>>> Thanks,
>>>>>> Vlad
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt
>>>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>>>
>>>>>>>
>>>>>>> OK, the issue is resolved. My dictionary was still in 32bit float
>>>>>>> from
>>>>>>> saving. When I convert it to 64float before calling fit it works
>>>>>>> fine.
>>>>>>>
>>>>>>> Sorry to bother.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 14.02.2017 11:00, Benjamin Merkt wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I tried that with no effect. The fit still breaks after two
>>>>>>>> iterations.
>>>>>>>>
>>>>>>>> If I set precompute=True I get three coefficients instead of only
>>>>>>>> two.
>>>>>>>> My Dictionary is fairly large (currently 128x42000). Is it even
>>>>>>>> feasible
>>>>>>>> to use OMP with such a big Matrix (even with ~120GB ram)?
>>>>>>>>
>>>>>>>> -Ben
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13.02.2017 23:31, Vlad Niculae wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Are the columns of your matrix normalized? Try setting
>>>>>>>>> `normalized=True`.
>>>>>>>>>
>>>>>>>>> Yours,
>>>>>>>>> Vlad
>>>>>>>>>
>>>>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt
>>>>>>>>> <benjamin.merkt at bcf.uni-freiburg.de> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a
>>>>>>>>>> signal using
>>>>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during
>>>>>>>>>> the fit I
>>>>>>>>>> get the following RuntimeWarning:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391:
>>>>>>>>>>
>>>>>>>>>> RuntimeWarning:  Orthogonal matching pursuit ended prematurely
>>>>>>>>>> due to
>>>>>>>>>> linear
>>>>>>>>>> dependence in the dictionary. The requested precision might not
>>>>>>>>>> have
>>>>>>>>>> been
>>>>>>>>>> met.
>>>>>>>>>>
>>>>>>>>>>   copy_X=copy_X, return_path=return_path)
>>>>>>>>>>
>>>>>>>>>> In those cases the results are indeed not satisfactory. I don't
>>>>>>>>>> get the
>>>>>>>>>> point of this warning as it is common in sparse coding to have an
>>>>>>>>>> overcomplete dictionary an thus also linear dependency within it.
>>>>>>>>>> That
>>>>>>>>>> should not be an issue for OMP. In fact, the warning is also
>>>>>>>>>> raised
>>>>>>>>>> if the
>>>>>>>>>> dictionary is a square matrix.
>>>>>>>>>>
>>>>>>>>>> Might this Warning also point to other issues in the application?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks, Ben
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> scikit-learn mailing list
>>>>>>>>>> scikit-learn at python.org
>>>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> scikit-learn mailing list
>>>>>>>>> scikit-learn at python.org
>>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From t3kcit at gmail.com  Sat Feb 18 13:15:23 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Sat, 18 Feb 2017 13:15:23 -0500
Subject: [scikit-learn] GSOC call for mentors
In-Reply-To: <CACDxx9ioX+nwe1ah3zF4tfo7cubw3SkXQCpAXAZr2p9t1rmX7Q@mail.gmail.com>
References: <CADxzQop6-oj_UojXjeRaBo_g24hQbSdtnJacocBZWq46tJWpeg@mail.gmail.com>
 <CADxzQorstHUVB59tHkuQ+FXtoQkLhTxEY4OuHQjEE=r5Rxgi5w@mail.gmail.com>
 <CADxzQopnk2jZE=Yh16z_eznEQaB5-S_ZV6aeQEqD+k_Ww_mRCw@mail.gmail.com>
 <CA+ad8EusUhTLng7kFkuGEofQdz-dwJ+F+gqO6cNJPw+LBzRKcA@mail.gmail.com>
 <CALoLHMKtrf-MzefcaWEL2X2=KuiD8QF18r-JxyMkdeb_2es-qA@mail.gmail.com>
 <CACDxx9ioX+nwe1ah3zF4tfo7cubw3SkXQCpAXAZr2p9t1rmX7Q@mail.gmail.com>
Message-ID: <e65a170a-f4d3-8774-e2b8-411025641905@gmail.com>

So have we made a decision not to participate?
I'm totally fine with that, but we should make it a conscious decision 
and not just wait until the deadline approaches and then hack something 
together last minute.

On 01/31/2017 12:55 PM, Guillaume Lema?tre wrote:
> I would be interested in helping for mentoring or whatever is needed 
> regarding the project.
>
> On 30 January 2017 at 21:25, Nelson Liu <nfliu at uw.edu 
> <mailto:nfliu at uw.edu>> wrote:
>
>     Hey all,
>     I'd be willing to help out with mentoring a project as well,
>     hopefully in tandem with someone else.
>
>     Nelson Liu
>
>     On Mon, Jan 30, 2017 at 10:10 AM Jacob Schreiber
>     <jmschreiber91 at gmail.com <mailto:jmschreiber91 at gmail.com>> wrote:
>
>         I discussed this briefly with Gael and Joel. The consensus was
>         that unless we already know excellent students who will fit
>         well that it is unlikely we will participate in GSoC. That
>         being said, if someone (other than me) is willing to step up
>         and organize it, I'd volunteer to be a mentor again. I think
>         an important project would be adding multithreading to
>         individual tree building so we can do gradient boosting in
>         parallel.
>
>         On Mon, Jan 30, 2017 at 5:38 AM, Andreas Mueller
>         <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>
>             Hey all.
>             It's that time of the year again.
>             Are we planning on participating in GSOC?
>             If so, we need mentors and projects.
>             It's unlikely that I'll have time to help with either in
>             any substantial way.
>             If we want to participate, I think we should try to be a
>             bit more organized than last year ;)
>
>             Andy
>
>             Sent from phone. Please excuse spelling and brevity.
>
>             _______________________________________________
>             scikit-learn mailing list
>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>             https://mail.python.org/mailman/listinfo/scikit-learn
>             <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>         <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
>
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Ile-de-France
> Equipe PARIETAL
> guillaume.lemaitre at inria.f <mailto:guillaume.lemaitre at inria.fr>r --- 
> https://glemaitre.github.io/
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170218/92e61a26/attachment.html>

From akshay0724 at gmail.com  Sat Feb 18 13:38:23 2017
From: akshay0724 at gmail.com (Akshay Gupta)
Date: Sun, 19 Feb 2017 00:08:23 +0530
Subject: [scikit-learn] Regarding scikit learn to take part in GSOC 2017
Message-ID: <CAOwEpA74GU=xd8Yck5NuTcLR1fK0Hw3Z67_aYcAx_isYr7Wq=Q@mail.gmail.com>

Dear Programmers,

I'm watching scikit learn on github from last few month and have also made
some contribution. Just now I found that this year there is no final plan
in community to take part in GSOC.
Community like Scikit Learn which have a unique place in industry should
promote open source contribution and must take part in events like GSOC.

My appeal is that scikit learn should at least have a idea page.Though it
is late but there still exist a chance to take part in GSOC.

Regards Akshay

GitHub Name - Akshay0724
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170219/cca1df5a/attachment.html>

From olivier.grisel at ensta.org  Sat Feb 18 13:54:05 2017
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Sat, 18 Feb 2017 19:54:05 +0100
Subject: [scikit-learn] GSOC call for mentors
In-Reply-To: <e65a170a-f4d3-8774-e2b8-411025641905@gmail.com>
References: <CADxzQop6-oj_UojXjeRaBo_g24hQbSdtnJacocBZWq46tJWpeg@mail.gmail.com>
 <CADxzQorstHUVB59tHkuQ+FXtoQkLhTxEY4OuHQjEE=r5Rxgi5w@mail.gmail.com>
 <CADxzQopnk2jZE=Yh16z_eznEQaB5-S_ZV6aeQEqD+k_Ww_mRCw@mail.gmail.com>
 <CA+ad8EusUhTLng7kFkuGEofQdz-dwJ+F+gqO6cNJPw+LBzRKcA@mail.gmail.com>
 <CALoLHMKtrf-MzefcaWEL2X2=KuiD8QF18r-JxyMkdeb_2es-qA@mail.gmail.com>
 <CACDxx9ioX+nwe1ah3zF4tfo7cubw3SkXQCpAXAZr2p9t1rmX7Q@mail.gmail.com>
 <e65a170a-f4d3-8774-e2b8-411025641905@gmail.com>
Message-ID: <CAFvE7K4DO8dK9m8g3Nw6ZSEmofJONBgw+gokQMuOUGSj0KGJ1g@mail.gmail.com>

Personally I don't feel like mentoring this year. I would really like
to focus my scikit-learn time on finishing the joblib process
refactoring with Thomas Moreau and the binning / thread-based
parallelization of boosted trees with Guillaume and Raghav.

-- 
Olivier

From shubham.bhardwaj2015 at vit.ac.in  Sat Feb 18 20:01:27 2017
From: shubham.bhardwaj2015 at vit.ac.in (SHUBHAM BHARDWAJ 15BCE0704)
Date: Sun, 19 Feb 2017 06:31:27 +0530
Subject: [scikit-learn] can we have a slack team for scikit-learn
Message-ID: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>

Hello Friends,

I have tried Slack and its awesome. Things are more dynamic. I have faced
some problems which I am sure slack can alleviate like-

When working on some issue if I need some guidance I am not sure when I
will get  reply. That maybe usually within 2-3 days or more.Maybe some
fellow programmer who is free can discuss and we may find a good solution.
I think collaboration would be much better.

Regards
Shubham Bhardwaj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170219/575546af/attachment-0001.html>

From naopon at gmail.com  Sat Feb 18 20:28:26 2017
From: naopon at gmail.com (Naoya Kanai)
Date: Sat, 18 Feb 2017 17:28:26 -0800
Subject: [scikit-learn] can we have a slack team for scikit-learn
In-Reply-To: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>
References: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>
Message-ID: <CAALmda=tinGh9GXZREi51FCiTv-ADq9ceXX_XH_=P7htSQPwNw@mail.gmail.com>

The Gitter channel is occasionally active (
https://gitter.im/scikit-learn/scikit-learn) so you might want to check it
out.

On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 <
shubham.bhardwaj2015 at vit.ac.in> wrote:

> Hello Friends,
>
> I have tried Slack and its awesome. Things are more dynamic. I have faced
> some problems which I am sure slack can alleviate like-
>
> When working on some issue if I need some guidance I am not sure when I
> will get  reply. That maybe usually within 2-3 days or more.Maybe some
> fellow programmer who is free can discuss and we may find a good solution.
> I think collaboration would be much better.
>
> Regards
> Shubham Bhardwaj
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170218/69f609ec/attachment.html>

From jmschreiber91 at gmail.com  Sat Feb 18 20:44:38 2017
From: jmschreiber91 at gmail.com (Jacob Schreiber)
Date: Sat, 18 Feb 2017 17:44:38 -0800
Subject: [scikit-learn] can we have a slack team for scikit-learn
In-Reply-To: <CAALmda=tinGh9GXZREi51FCiTv-ADq9ceXX_XH_=P7htSQPwNw@mail.gmail.com>
References: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>
 <CAALmda=tinGh9GXZREi51FCiTv-ADq9ceXX_XH_=P7htSQPwNw@mail.gmail.com>
Message-ID: <CA+ad8Ev2EyvUzC-B6JsqxtvxnxaNEv_ZirR+1vi6ESGYMSHnnw@mail.gmail.com>

I would support a slack channel --if-- we had channels for different groups
of modules, like a tree channel and a linear methods channel, and
developers involved in those sections populated the channels. This would
allow people to ask questions to developers involved directly. However, I
can easily see this becoming yet another chat medium that is sparsely
attended in which case it would be detrimental to split everyone's
attention even further.

On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai <naopon at gmail.com> wrote:

> The Gitter channel is occasionally active (https://gitter.im/scikit-
> learn/scikit-learn) so you might want to check it out.
>
> On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 <
> shubham.bhardwaj2015 at vit.ac.in> wrote:
>
>> Hello Friends,
>>
>> I have tried Slack and its awesome. Things are more dynamic. I have faced
>> some problems which I am sure slack can alleviate like-
>>
>> When working on some issue if I need some guidance I am not sure when I
>> will get  reply. That maybe usually within 2-3 days or more.Maybe some
>> fellow programmer who is free can discuss and we may find a good solution.
>> I think collaboration would be much better.
>>
>> Regards
>> Shubham Bhardwaj
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170218/d2d0ae1f/attachment.html>

From jmschreiber91 at gmail.com  Sat Feb 18 20:45:55 2017
From: jmschreiber91 at gmail.com (Jacob Schreiber)
Date: Sat, 18 Feb 2017 17:45:55 -0800
Subject: [scikit-learn] GSOC call for mentors
In-Reply-To: <CAFvE7K4DO8dK9m8g3Nw6ZSEmofJONBgw+gokQMuOUGSj0KGJ1g@mail.gmail.com>
References: <CADxzQop6-oj_UojXjeRaBo_g24hQbSdtnJacocBZWq46tJWpeg@mail.gmail.com>
 <CADxzQorstHUVB59tHkuQ+FXtoQkLhTxEY4OuHQjEE=r5Rxgi5w@mail.gmail.com>
 <CADxzQopnk2jZE=Yh16z_eznEQaB5-S_ZV6aeQEqD+k_Ww_mRCw@mail.gmail.com>
 <CA+ad8EusUhTLng7kFkuGEofQdz-dwJ+F+gqO6cNJPw+LBzRKcA@mail.gmail.com>
 <CALoLHMKtrf-MzefcaWEL2X2=KuiD8QF18r-JxyMkdeb_2es-qA@mail.gmail.com>
 <CACDxx9ioX+nwe1ah3zF4tfo7cubw3SkXQCpAXAZr2p9t1rmX7Q@mail.gmail.com>
 <e65a170a-f4d3-8774-e2b8-411025641905@gmail.com>
 <CAFvE7K4DO8dK9m8g3Nw6ZSEmofJONBgw+gokQMuOUGSj0KGJ1g@mail.gmail.com>
Message-ID: <CA+ad8Evk6SuXLvZXCs7r4YucQiRMqRFb_2J8DnoxLsifQM4F9Q@mail.gmail.com>

I think we have de facto decided not to participate by not having someone
step up by now and organize it like Raghav did last year.

On Sat, Feb 18, 2017 at 10:54 AM, Olivier Grisel <olivier.grisel at ensta.org>
wrote:

> Personally I don't feel like mentoring this year. I would really like
> to focus my scikit-learn time on finishing the joblib process
> refactoring with Thomas Moreau and the binning / thread-based
> parallelization of boosted trees with Guillaume and Raghav.
>
> --
> Olivier
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170218/ea32ae77/attachment.html>

From jmschreiber91 at gmail.com  Sat Feb 18 20:51:31 2017
From: jmschreiber91 at gmail.com (Jacob Schreiber)
Date: Sat, 18 Feb 2017 17:51:31 -0800
Subject: [scikit-learn] Regarding scikit learn to take part in GSOC 2017
In-Reply-To: <CAOwEpA74GU=xd8Yck5NuTcLR1fK0Hw3Z67_aYcAx_isYr7Wq=Q@mail.gmail.com>
References: <CAOwEpA74GU=xd8Yck5NuTcLR1fK0Hw3Z67_aYcAx_isYr7Wq=Q@mail.gmail.com>
Message-ID: <CA+ad8Evn-QkfCGrSwZ6R2ZQ10t5R6hp7C0uuymL0ojZqTscBYQ@mail.gmail.com>

Hi Akshay

Thanks for the note.

We've had several threads discussing this, and appear to have come to the
consensus that while there are some people who are willing to serve as
mentors, no one has the time right now to organize the entire thing. The
team always welcomes contributions and is willing to guide people seeking
to merge nice pull requests. For me specifically, I'd still love to work
with someone who is willing to parallelize single decision tree building,
but I don't have the time myself to implement this now or go through the
process to set up a GSoC.

Jacob

On Sat, Feb 18, 2017 at 10:38 AM, Akshay Gupta <akshay0724 at gmail.com> wrote:

> Dear Programmers,
>
> I'm watching scikit learn on github from last few month and have also
> made some contribution. Just now I found that this year there is no final
> plan in community to take part in GSOC.
> Community like Scikit Learn which have a unique place in industry should
> promote open source contribution and must take part in events like GSOC.
>
> My appeal is that scikit learn should at least have a idea page.Though it
> is late but there still exist a chance to take part in GSOC.
>
> Regards Akshay
>
> GitHub Name - Akshay0724
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170218/8961c9b6/attachment-0001.html>

From nfliu at uw.edu  Sat Feb 18 20:50:49 2017
From: nfliu at uw.edu (Nelson Liu)
Date: Sat, 18 Feb 2017 17:50:49 -0800
Subject: [scikit-learn] can we have a slack team for scikit-learn
In-Reply-To: <CA+ad8Ev2EyvUzC-B6JsqxtvxnxaNEv_ZirR+1vi6ESGYMSHnnw@mail.gmail.com>
References: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>
 <CAALmda=tinGh9GXZREi51FCiTv-ADq9ceXX_XH_=P7htSQPwNw@mail.gmail.com>
 <CA+ad8Ev2EyvUzC-B6JsqxtvxnxaNEv_ZirR+1vi6ESGYMSHnnw@mail.gmail.com>
Message-ID: <CALoLHMLLvF=u+viUi1SU7GsSbazntPF=+6OZpzR8rvoNJidJ_g@mail.gmail.com>

>  However, I can easily see this becoming yet another chat medium that is
sparsely attended in which case it would be detrimental to split everyone's
attention even further.
I definitely agree with this and think that this would (likely) be the end
outcome -- gitter and irc didn't/don't "work", so I'm pessimistic as to
slack's chances.

On Sat, Feb 18, 2017 at 5:44 PM, Jacob Schreiber <jmschreiber91 at gmail.com>
wrote:

> I would support a slack channel --if-- we had channels for different
> groups of modules, like a tree channel and a linear methods channel, and
> developers involved in those sections populated the channels. This would
> allow people to ask questions to developers involved directly. However, I
> can easily see this becoming yet another chat medium that is sparsely
> attended in which case it would be detrimental to split everyone's
> attention even further.
>
> On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai <naopon at gmail.com> wrote:
>
>> The Gitter channel is occasionally active (https://gitter.im/scikit-lear
>> n/scikit-learn) so you might want to check it out.
>>
>> On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 <
>> shubham.bhardwaj2015 at vit.ac.in> wrote:
>>
>>> Hello Friends,
>>>
>>> I have tried Slack and its awesome. Things are more dynamic. I have
>>> faced some problems which I am sure slack can alleviate like-
>>>
>>> When working on some issue if I need some guidance I am not sure when I
>>> will get  reply. That maybe usually within 2-3 days or more.Maybe some
>>> fellow programmer who is free can discuss and we may find a good solution.
>>> I think collaboration would be much better.
>>>
>>> Regards
>>> Shubham Bhardwaj
>>>
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170218/3bb3601b/attachment.html>

From joel.nothman at gmail.com  Sun Feb 19 07:43:25 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Sun, 19 Feb 2017 23:43:25 +1100
Subject: [scikit-learn] GSOC call for mentors
In-Reply-To: <CA+ad8Evk6SuXLvZXCs7r4YucQiRMqRFb_2J8DnoxLsifQM4F9Q@mail.gmail.com>
References: <CADxzQop6-oj_UojXjeRaBo_g24hQbSdtnJacocBZWq46tJWpeg@mail.gmail.com>
 <CADxzQorstHUVB59tHkuQ+FXtoQkLhTxEY4OuHQjEE=r5Rxgi5w@mail.gmail.com>
 <CADxzQopnk2jZE=Yh16z_eznEQaB5-S_ZV6aeQEqD+k_Ww_mRCw@mail.gmail.com>
 <CA+ad8EusUhTLng7kFkuGEofQdz-dwJ+F+gqO6cNJPw+LBzRKcA@mail.gmail.com>
 <CALoLHMKtrf-MzefcaWEL2X2=KuiD8QF18r-JxyMkdeb_2es-qA@mail.gmail.com>
 <CACDxx9ioX+nwe1ah3zF4tfo7cubw3SkXQCpAXAZr2p9t1rmX7Q@mail.gmail.com>
 <e65a170a-f4d3-8774-e2b8-411025641905@gmail.com>
 <CAFvE7K4DO8dK9m8g3Nw6ZSEmofJONBgw+gokQMuOUGSj0KGJ1g@mail.gmail.com>
 <CA+ad8Evk6SuXLvZXCs7r4YucQiRMqRFb_2J8DnoxLsifQM4F9Q@mail.gmail.com>
Message-ID: <CAAkaFLV8bSJwq4wJHtRVk9bfjaRDOC-5r16soi-Xeym3nbdqmg@mail.gmail.com>

I am sure there are many people disappointed by the idea that we may not
run with GSoC this year. On the one hand, we could ? as Ga?l has suggested
? really benefit from having more people involved in the maintenance of
scikit-learn, and GSoC provides a potential pathway for newcomers. On the
other hand, we have such an enormous quantity of PRs to review and decide
upon already, that code is far from the main thing we are lacking in
contribution; and, with notable exceptions, the active / long-term core
devs have overwhelmingly not come in through the GSoC pathway.

I also think we are at a stage of maturity where it is becoming relatively
hard to design projects that are clearly beneficial, not going to create
future maintenance burden, and can be performed by someone relatively new
to developing scikit-learn. But as others have suggested on this list,
there may be projects within the scikit-learn ecosystem that *do* need code
and can clearly defined projects.

I think if there were a clearly scoped project and a promising student, we
would find mentor availability. But the core devs have not had capacity to
design a project within the above constraints, and no student has come
forward with a clear proposition.

Potential students must recognise that GSoC funding assumes, essentially,
in-kind contributions from mentors in time. Since we're mostly relying here
on volunteers, how readily we can afford that contribution needs to be
rationalised.

On 19 February 2017 at 12:45, Jacob Schreiber <jmschreiber91 at gmail.com>
wrote:

> I think we have de facto decided not to participate by not having someone
> step up by now and organize it like Raghav did last year.
>
> On Sat, Feb 18, 2017 at 10:54 AM, Olivier Grisel <olivier.grisel at ensta.org
> > wrote:
>
>> Personally I don't feel like mentoring this year. I would really like
>> to focus my scikit-learn time on finishing the joblib process
>> refactoring with Thomas Moreau and the binning / thread-based
>> parallelization of boosted trees with Guillaume and Raghav.
>>
>> --
>> Olivier
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170219/6d758ad7/attachment.html>

From gael.varoquaux at normalesup.org  Sun Feb 19 12:08:15 2017
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 19 Feb 2017 18:08:15 +0100
Subject: [scikit-learn] can we have a slack team for scikit-learn
In-Reply-To: <CALoLHMLLvF=u+viUi1SU7GsSbazntPF=+6OZpzR8rvoNJidJ_g@mail.gmail.com>
References: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>
 <CAALmda=tinGh9GXZREi51FCiTv-ADq9ceXX_XH_=P7htSQPwNw@mail.gmail.com>
 <CA+ad8Ev2EyvUzC-B6JsqxtvxnxaNEv_ZirR+1vi6ESGYMSHnnw@mail.gmail.com>
 <CALoLHMLLvF=u+viUi1SU7GsSbazntPF=+6OZpzR8rvoNJidJ_g@mail.gmail.com>
Message-ID: <20170219170815.GA2499052@phare.normalesup.org>

I agree: the limiting factor is everybody's time. Technology doesn't help
much in this respect. I am afraid that if we add a slack channel, we are
just going to get much dilution. I don't see what killer feature that
slack has that would suddenly make it that knowledgeable people have more
time.

In order to make the best use of my personnal time, I must admit that I
stick to a rule: github is where I spend my time.

Ga?l

On Sat, Feb 18, 2017 at 05:50:49PM -0800, Nelson Liu wrote:
> >??However, I can easily see this becoming yet another chat medium that is
> sparsely attended in which case it would be detrimental to split everyone's
> attention even further.?
> I definitely agree with this and think that this would (likely) be the end
> outcome -- gitter and irc didn't/don't "work", so I'm pessimistic as to slack's
> chances.

> On Sat, Feb 18, 2017 at 5:44 PM, Jacob Schreiber <jmschreiber91 at gmail.com>
> wrote:

>     I would support a slack channel --if-- we had channels for different groups
>     of modules, like a tree channel and a linear methods channel, and
>     developers involved in those sections populated the channels. This would
>     allow people to ask questions to developers involved directly. However, I
>     can easily see this becoming yet another chat medium that is sparsely
>     attended in which case it would be detrimental to split everyone's
>     attention even further.?

>     On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai <naopon at gmail.com> wrote:

>         The Gitter channel is occasionally active (https://gitter.im/
>         scikit-learn/scikit-learn) so you might want to check it out.

>         On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 <
>         shubham.bhardwaj2015 at vit.ac.in> wrote:

>             Hello Friends,

>             I have tried Slack and its awesome. Things are more dynamic. I have
>             faced some problems which I am sure slack can alleviate like-

>             When working on some issue if I need some guidance I am not sure
>             when I will get ?reply. That maybe usually within 2-3 days or
>             more.Maybe some fellow programmer who is free can discuss and we
>             may find a good solution. I think collaboration would be much
>             better.

>             Regards
>             Shubham Bhardwaj


>             _______________________________________________
>             scikit-learn mailing list
>             scikit-learn at python.org
>             https://mail.python.org/mailman/listinfo/scikit-learn


>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org
>         https://mail.python.org/mailman/listinfo/scikit-learn


>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org
>     https://mail.python.org/mailman/listinfo/scikit-learn


> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From se.raschka at gmail.com  Sun Feb 19 13:15:45 2017
From: se.raschka at gmail.com (Sebastian Raschka)
Date: Sun, 19 Feb 2017 13:15:45 -0500
Subject: [scikit-learn] can we have a slack team for scikit-learn
In-Reply-To: <20170219170815.GA2499052@phare.normalesup.org>
References: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>
 <CAALmda=tinGh9GXZREi51FCiTv-ADq9ceXX_XH_=P7htSQPwNw@mail.gmail.com>
 <CA+ad8Ev2EyvUzC-B6JsqxtvxnxaNEv_ZirR+1vi6ESGYMSHnnw@mail.gmail.com>
 <CALoLHMLLvF=u+viUi1SU7GsSbazntPF=+6OZpzR8rvoNJidJ_g@mail.gmail.com>
 <20170219170815.GA2499052@phare.normalesup.org>
Message-ID: <64F9320E-F7BD-4144-8AC2-F2BAA8DDCF0F@gmail.com>

In my opinion, Slack can be quite useful for discussing things ?live.? However, one of the main problems I have with Slack ? I am using it for some other projects ? is that it is easy to lose track if important things are discussed and one is not constantly online and checking the timeline. In any case, I think Slack would be the same as using the already existing Gitter channel ? the only difference in my view would be that Slack, as a brand, is more popular for some reason. I think Slack/Gitter could be useful for sprints though, augmenting GitHub.


> On Feb 19, 2017, at 12:08 PM, Gael Varoquaux <gael.varoquaux at normalesup.org> wrote:
> 
> I agree: the limiting factor is everybody's time. Technology doesn't help
> much in this respect. I am afraid that if we add a slack channel, we are
> just going to get much dilution. I don't see what killer feature that
> slack has that would suddenly make it that knowledgeable people have more
> time.
> 
> In order to make the best use of my personnal time, I must admit that I
> stick to a rule: github is where I spend my time.
> 
> Ga?l
> 
> On Sat, Feb 18, 2017 at 05:50:49PM -0800, Nelson Liu wrote:
>>>   However, I can easily see this becoming yet another chat medium that is
>> sparsely attended in which case it would be detrimental to split everyone's
>> attention even further. 
>> I definitely agree with this and think that this would (likely) be the end
>> outcome -- gitter and irc didn't/don't "work", so I'm pessimistic as to slack's
>> chances.
> 
>> On Sat, Feb 18, 2017 at 5:44 PM, Jacob Schreiber <jmschreiber91 at gmail.com>
>> wrote:
> 
>>    I would support a slack channel --if-- we had channels for different groups
>>    of modules, like a tree channel and a linear methods channel, and
>>    developers involved in those sections populated the channels. This would
>>    allow people to ask questions to developers involved directly. However, I
>>    can easily see this becoming yet another chat medium that is sparsely
>>    attended in which case it would be detrimental to split everyone's
>>    attention even further. 
> 
>>    On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai <naopon at gmail.com> wrote:
> 
>>        The Gitter channel is occasionally active (https://gitter.im/
>>        scikit-learn/scikit-learn) so you might want to check it out.
> 
>>        On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 <
>>        shubham.bhardwaj2015 at vit.ac.in> wrote:
> 
>>            Hello Friends,
> 
>>            I have tried Slack and its awesome. Things are more dynamic. I have
>>            faced some problems which I am sure slack can alleviate like-
> 
>>            When working on some issue if I need some guidance I am not sure
>>            when I will get  reply. That maybe usually within 2-3 days or
>>            more.Maybe some fellow programmer who is free can discuss and we
>>            may find a good solution. I think collaboration would be much
>>            better.
> 
>>            Regards
>>            Shubham Bhardwaj
> 
> 
> 
>>            _______________________________________________
>>            scikit-learn mailing list
>>            scikit-learn at python.org
>>            https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> 
>>        _______________________________________________
>>        scikit-learn mailing list
>>        scikit-learn at python.org
>>        https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> 
>>    _______________________________________________
>>    scikit-learn mailing list
>>    scikit-learn at python.org
>>    https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> -- 
>    Gael Varoquaux
>    Researcher, INRIA Parietal
>    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>    Phone:  ++ 33-1-69-08-79-68
>    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From vincent.dubourg at gmail.com  Mon Feb 20 03:21:58 2017
From: vincent.dubourg at gmail.com (Vincent Dubourg)
Date: Mon, 20 Feb 2017 09:21:58 +0100
Subject: [scikit-learn] Confidence intervals on GaussianProcessRegressor
 hyperparameters estimates
Message-ID: <CAOpLUWzeqTv+v-6Y6kpzN051fSM9fSV3q69wsCFsRLsUV-trQw@mail.gmail.com>

Hi list,

Did anyone ever considered using the Cramer-Rao lower bound estimate for
the variance-covariance matrix of the GaussianProcess hyperparameters
estimate?

I have seen the gradient of the marginal log likelihood is already
available. How about the hessian matrix?

Looking at the theta values wrt to the one in the fitted kernel themselves,
it looks like some normalization occurs, which is fine, though how do I get
the true gradient back?

Actually, I am more interested in infering the parameters rather than
predicting. I have considered using pymc3 but MCMC is quite time expensive
and I would like to be able to speed this with a reasonable approximation.
George is also an alternative but out of the question since I am running
Windows.

Thank you,
Vincent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170220/afaf7434/attachment.html>

From t3kcit at gmail.com  Tue Feb 21 09:18:29 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 21 Feb 2017 09:18:29 -0500
Subject: [scikit-learn] can we have a slack team for scikit-learn
In-Reply-To: <64F9320E-F7BD-4144-8AC2-F2BAA8DDCF0F@gmail.com>
References: <CAPaPfup=pVvXR+ghFCgOd8qrMPc5uRKj4t0L_Fi-eXXEfq9w1w@mail.gmail.com>
 <CAALmda=tinGh9GXZREi51FCiTv-ADq9ceXX_XH_=P7htSQPwNw@mail.gmail.com>
 <CA+ad8Ev2EyvUzC-B6JsqxtvxnxaNEv_ZirR+1vi6ESGYMSHnnw@mail.gmail.com>
 <CALoLHMLLvF=u+viUi1SU7GsSbazntPF=+6OZpzR8rvoNJidJ_g@mail.gmail.com>
 <20170219170815.GA2499052@phare.normalesup.org>
 <64F9320E-F7BD-4144-8AC2-F2BAA8DDCF0F@gmail.com>
Message-ID: <9aa6e4dd-3ee8-6483-2730-077b844c3d79@gmail.com>

I agree with the rest.
How would slack channels be different from the existing gitter channels?


From t3kcit at gmail.com  Tue Feb 21 09:22:18 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 21 Feb 2017 09:22:18 -0500
Subject: [scikit-learn] Request for single pass clustering algorithm
In-Reply-To: <CAA-PCZwm6ShjwmVDVkn8Y4pBtVJ13pOQ4uSY20zvV2vn93==Ug@mail.gmail.com>
References: <CAA-PCZwm6ShjwmVDVkn8Y4pBtVJ13pOQ4uSY20zvV2vn93==Ug@mail.gmail.com>
Message-ID: <4be9f13b-fbbe-a33b-c5fa-2bfc30ad0987@gmail.com>

Hi Gaurish.
scikit-learn-owner is the email address for the mailing list 
administration. See the FAQ on contacting the project:
http://scikit-learn.org/stable/faq.html#what-s-the-best-way-to-get-help-on-scikit-learn-usage

For feature requests I would suggest the issue tracker.

This sounds like "single pass" is the same as agglomerative clustering 
with the average linkage criterion.
Or is it any different from that?

Andy


On 02/21/2017 04:39 AM, Gaurish Thakkar wrote:
>
> I would like to make a request to Scikit team to please implement and 
> incorporate the single pass clustering algorithm. It is one of the 
> most basic online algorithms and
> the link below dicusses the process in detail.
> [http://facweb.cs.depaul.edu/mobasher/classes/csc575/assignments/single-pass.html]
>
> -- 
> /Regards:/
>  Gaurish P Thakkar
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170221/0133dd3d/attachment.html>

From t3kcit at gmail.com  Tue Feb 21 09:24:54 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 21 Feb 2017 09:24:54 -0500
Subject: [scikit-learn] GSOC call for mentors
In-Reply-To: <CAAkaFLV8bSJwq4wJHtRVk9bfjaRDOC-5r16soi-Xeym3nbdqmg@mail.gmail.com>
References: <CADxzQop6-oj_UojXjeRaBo_g24hQbSdtnJacocBZWq46tJWpeg@mail.gmail.com>
 <CADxzQorstHUVB59tHkuQ+FXtoQkLhTxEY4OuHQjEE=r5Rxgi5w@mail.gmail.com>
 <CADxzQopnk2jZE=Yh16z_eznEQaB5-S_ZV6aeQEqD+k_Ww_mRCw@mail.gmail.com>
 <CA+ad8EusUhTLng7kFkuGEofQdz-dwJ+F+gqO6cNJPw+LBzRKcA@mail.gmail.com>
 <CALoLHMKtrf-MzefcaWEL2X2=KuiD8QF18r-JxyMkdeb_2es-qA@mail.gmail.com>
 <CACDxx9ioX+nwe1ah3zF4tfo7cubw3SkXQCpAXAZr2p9t1rmX7Q@mail.gmail.com>
 <e65a170a-f4d3-8774-e2b8-411025641905@gmail.com>
 <CAFvE7K4DO8dK9m8g3Nw6ZSEmofJONBgw+gokQMuOUGSj0KGJ1g@mail.gmail.com>
 <CA+ad8Evk6SuXLvZXCs7r4YucQiRMqRFb_2J8DnoxLsifQM4F9Q@mail.gmail.com>
 <CAAkaFLV8bSJwq4wJHtRVk9bfjaRDOC-5r16soi-Xeym3nbdqmg@mail.gmail.com>
Message-ID: <d61e0e79-2303-048a-78e8-e328ec0a1206@gmail.com>

I agree, I just wanted to make sure we are on the same page, and that we 
can tell people "no we're not gonna do GSoC"
instead of "err I don't know what's happening, maybe not?"

From t3kcit at gmail.com  Tue Feb 21 09:52:31 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 21 Feb 2017 09:52:31 -0500
Subject: [scikit-learn] Scipy 2017
Message-ID: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com>

Hey folks.
Who's coming to scipy this year?
Any volunteers for tutorials? I'm happy to be part of it but doing 7h by 
myself is a bit much ;)


Andy

From jeff1evesque at yahoo.com  Tue Feb 21 10:07:00 2017
From: jeff1evesque at yahoo.com (Jeffrey Levesque)
Date: Tue, 21 Feb 2017 10:07:00 -0500
Subject: [scikit-learn] GSOC call for mentors
In-Reply-To: <d61e0e79-2303-048a-78e8-e328ec0a1206@gmail.com>
References: <CADxzQop6-oj_UojXjeRaBo_g24hQbSdtnJacocBZWq46tJWpeg@mail.gmail.com>
 <CADxzQorstHUVB59tHkuQ+FXtoQkLhTxEY4OuHQjEE=r5Rxgi5w@mail.gmail.com>
 <CADxzQopnk2jZE=Yh16z_eznEQaB5-S_ZV6aeQEqD+k_Ww_mRCw@mail.gmail.com>
 <CA+ad8EusUhTLng7kFkuGEofQdz-dwJ+F+gqO6cNJPw+LBzRKcA@mail.gmail.com>
 <CALoLHMKtrf-MzefcaWEL2X2=KuiD8QF18r-JxyMkdeb_2es-qA@mail.gmail.com>
 <CACDxx9ioX+nwe1ah3zF4tfo7cubw3SkXQCpAXAZr2p9t1rmX7Q@mail.gmail.com>
 <e65a170a-f4d3-8774-e2b8-411025641905@gmail.com>
 <CAFvE7K4DO8dK9m8g3Nw6ZSEmofJONBgw+gokQMuOUGSj0KGJ1g@mail.gmail.com>
 <CA+ad8Evk6SuXLvZXCs7r4YucQiRMqRFb_2J8DnoxLsifQM4F9Q@mail.gmail.com>
 <CAAkaFLV8bSJwq4wJHtRVk9bfjaRDOC-5r16soi-Xeym3nbdqmg@mail.gmail.com>
 <d61e0e79-2303-048a-78e8-e328ec0a1206@gmail.com>
Message-ID: <29E6DB6D-188C-406A-967F-670E6C10D3E6@yahoo.com>

Hey guys,

Maybe you guys could redirect some of them to related scikit-learn projects?  For example, my project intends to interface scikit-learn:

- https://github.com/jeff1evesque/machine-learning

Even though it's a lot of JavaScript (web-interface), and puppet scripts for automating the build, I will need some help getting the python backend logic to correctly snap in to scikit-learns utilities.  If some of you want to assist me mentor (I know some of you wanted to mentor), since you guys are scikit-learn developers, that would be hugely helpful.

Even though individuals may not be creating new features (new algorithms, or optimizing) perse, they could assist me interfacing existing scikit-learn utilities, by writing corresponding Python logic which would properly delegate datasets into corresponding databases, and such.  This would largely make sklearn utilities available to a web-interface, as well as a server API - at least my intention.

My python syntax is the prettiest, so I welcome anyone to help improve it - since, this is largely a home pet project, and sometimes I only have 1-2 hours a day to work on this project.

Thank you,

Jeff Levesque
https://github.com/jeff1evesque

> On Feb 21, 2017, at 9:24 AM, Andreas Mueller <t3kcit at gmail.com> wrote:
> 
> I agree, I just wanted to make sure we are on the same page, and that we can tell people "no we're not gonna do GSoC"
> instead of "err I don't know what's happening, maybe not?"
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

Thank you,

Jeff Levesque
https://github.com/jeff1evesque

> On Feb 21, 2017, at 9:24 AM, Andreas Mueller <t3kcit at gmail.com> wrote:
> 
> I agree, I just wanted to make sure we are on the same page, and that we can tell people "no we're not gonna do GSoC"
> instead of "err I don't know what's happening, maybe not?"
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170221/1667a33f/attachment-0001.html>

From t3kcit at gmail.com  Tue Feb 21 11:31:43 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 21 Feb 2017 11:31:43 -0500
Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In-Reply-To: <CAAkaFLVHxmT4q+JD0Y8JRdphJbmaf4wywW62C+QOsMJCD9tghg@mail.gmail.com>
References: <CAFvE7K5V43FBmxthqo4ntzxZwsYUUWXQxfN0Px5-GfziBK_5mQ@mail.gmail.com>
 <CAAkaFLWO5B2q5TLmc0CJ=wp1fzzC+jCNM2JUcxW+611SMz+zZQ@mail.gmail.com>
 <CAFvE7K5X9q2_33ndmHM6-Y0hAWbZqFBhm_jrRD4FboFq0TzDrA@mail.gmail.com>
 <CACmxyDEsDHdFwQy6aRE_w4eSg1FW+db3Vu0xZnZtkQxJU1q-5g@mail.gmail.com>
 <CAFvE7K4o1O3FsqDFdRSd+S1U4S5O_nJXMsUHSYJet6KqKKFF5w@mail.gmail.com>
 <20170109151546.GM2802991@phare.normalesup.org>
 <a9b93421-17b8-23ad-c910-9b6c80ba1a9e@gmail.com>
 <CAAkaFLXtRHHvN+nAV7MqxBkKeLuvnHicNM5LmRPUCZ+5UxH=MQ@mail.gmail.com>
 <20170111215115.GO1585067@phare.normalesup.org>
 <CAAkaFLVHxmT4q+JD0Y8JRdphJbmaf4wywW62C+QOsMJCD9tghg@mail.gmail.com>
Message-ID: <d656af99-8abb-e8ef-88b1-e7d90f8dcd55@gmail.com>


On 02/07/2017 09:00 PM, Joel Nothman wrote:
> On 12 January 2017 at 08:51, Gael Varoquaux 
> <gael.varoquaux at normalesup.org <mailto:gael.varoquaux at normalesup.org>> 
> wrote:
>
>     On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote:
>     > When the two versions deprecation policy was instituted,
>     releases were much
>     > more frequent... Is that enough of an excuse?
>
>     I'd rather say that we can here decide that we are giving a longer
>     grace
>     period.
>
>     I think that slow deprecations are a good things (see titus's blog
>     post
>     here:
>     http://ivory.idyll.org/blog/2017-pof-software-archivability.html
>     <http://ivory.idyll.org/blog/2017-pof-software-archivability.html> )
>
> Given that 0.18 was a very slow release, and the work for removing 
> deprecated material from 0.19 has already been done, I don't think we 
> should revert that. I agree that we can delay the deprecation deadline 
> for 0.20 and 0.21.
>
> In terms of release schedule, are we aiming for RC in early-mid March, 
> assuming Andy's above prognostications are correct and he is able to 
> review in a bigger way in a week or so?
>
Sometimes I wonder how Amazon ever gave me a job in forecasting....
Spring break is March 13-17th ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170221/f1597cec/attachment.html>

From gael.varoquaux at normalesup.org  Mon Feb 27 05:58:35 2017
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 27 Feb 2017 11:58:35 +0100
Subject: [scikit-learn] GSoC 2017
Message-ID: <20170227105835.GC2041043@phare.normalesup.org>

Hi,

Students have been inquiring about the GSoC (Google Summer of Code) with
scikit-learn, and the core team has been quite silent about team.

I am happy to announce that we will be taking part in the scikit-learn
again. The reason that we decided to do this is to give a chance to the
young, talented, and motivated students.

Importantly, our most limiting resource is the time of our experienced
developers. This is clearly visible from the number of pending pull
requests. Hence, we need students to be very able and independent. This
of course means that they will be getting supervision from mentors. Such
supervision is crucial for moving forward with a good project, that
delivers mergeable code. However, we will need the students to be very
good at interacting efficiently with the mentors. Also, I should stress
that we will be able to take only a very few numbers of students.

With that said, let me introduce the 2017 GSoC for scikit-learn. We have
set up a wiki page which summarizes the experiences from last year and
the ideas for this year:
https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017

Interested students should declare their interest on the mailing list,
and discuss with possible mentors here. Factors of success will be

* careful work on a good proposal, that takes on of the ideas on the wiki
  but breaks it down in a realistic plan with multiple steps and shows a
  good understanding of the problem.

* demonstration of the required skillset via successful pull requests in
  scikit-learn.

Cheers,

Ga?l


-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From ludo25_90 at hotmail.com  Mon Feb 27 09:27:59 2017
From: ludo25_90 at hotmail.com (Ludovico Coletta)
Date: Mon, 27 Feb 2017 14:27:59 +0000
Subject: [scikit-learn] Control over the inner loop in GridSearchCV
Message-ID: <BLUPR0301MB2017606E3E103266BBAB5E698C570@BLUPR0301MB2017.namprd03.prod.outlook.com>

Dear Scikit experts,


we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will.


We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease.


The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:


pipeline = Pipeline([('scl', StandardScaler()),
                    ('sel', RFE(estimator,step=0.2)),
                                    ('clf', SVC(probability=True, random_state=42))])


param_grid = [{'sel__n_features_to_select':[22,15,10,2],
                           'clf__C': np.logspace(-3, 5, 100),
                   'clf__kernel':['linear']}]

clf = GridSearchCV(pipeline,
                          param_grid=param_grid,
                  verbose=1,
                                  scoring='roc_auc',
                  n_jobs= -1)

# cv_final is the custom cv for the outer loop (9 folds)

ii = 0

while ii < len(cv_final):
# fit and predict

clf.fit(data[?]], y[[?]])
predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
ii = ii + 1


We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.

Two questions:

1) Is there any workaround to avoid the split when clf is called without a cv argument?

2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly


Thank your for your time and sorry for the long text

Ludovico
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/e80777cb/attachment.html>

From se.raschka at gmail.com  Mon Feb 27 11:27:24 2017
From: se.raschka at gmail.com (Sebastian Raschka)
Date: Mon, 27 Feb 2017 11:27:24 -0500
Subject: [scikit-learn] Control over the inner loop in GridSearchCV
In-Reply-To: <BLUPR0301MB2017606E3E103266BBAB5E698C570@BLUPR0301MB2017.namprd03.prod.outlook.com>
References: <BLUPR0301MB2017606E3E103266BBAB5E698C570@BLUPR0301MB2017.namprd03.prod.outlook.com>
Message-ID: <FC403FD1-9A00-424A-8453-9D60FE176C92@gmail.com>

Hi, Ludovico,
what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g.,  

skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1)
for outer_train_idx, outer_valid_idx in skfold:
    ?
    gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx])

> 
> On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.  
> Two questions: 

Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.:

outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)

for name, gs_est in sorted(gridcvs.items()):
    nested_score = cross_val_score(gs_est,                 
    X=X_train,                      
    y=y_train,                                 
   cv=outer_cv,                             
   n_jobs=1)


Best,
Sebastian

> On Feb 27, 2017, at 9:27 AM, Ludovico Coletta <ludo25_90 at hotmail.com> wrote:
> 
> Dear Scikit experts,
> 
> we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. 
> 
> We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. 
> 
> The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:
> 
> pipeline = Pipeline([('scl', StandardScaler()),
>                     ('sel', RFE(estimator,step=0.2)),       
>                                     ('clf', SVC(probability=True, random_state=42))])
>                      
>                      
> param_grid = [{'sel__n_features_to_select':[22,15,10,2],
>                            'clf__C': np.logspace(-3, 5, 100), 
>                    'clf__kernel':['linear']}]
> 
> clf = GridSearchCV(pipeline, 
>                           param_grid=param_grid, 
>                   verbose=1, 
>                                   scoring='roc_auc', 
>                   n_jobs= -1)
> 
> # cv_final is the custom cv for the outer loop (9 folds)
> 
> ii = 0
> 
> while ii < len(cv_final):  
> # fit and predict
> 
> clf.fit(data[?]], y[[?]])
> predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
> ii = ii + 1
> 
> We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.  
> Two questions: 
> 1) Is there any workaround to avoid the split when clf is called without a cv argument? 
> 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly
> 
> Thank your for your time and sorry for the long text
> Ludovico
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From nfliu at uw.edu  Mon Feb 27 12:46:25 2017
From: nfliu at uw.edu (Nelson Liu)
Date: Mon, 27 Feb 2017 09:46:25 -0800
Subject: [scikit-learn] GSoC 2017
In-Reply-To: <20170227105835.GC2041043@phare.normalesup.org>
References: <20170227105835.GC2041043@phare.normalesup.org>
Message-ID: <CALoLHMKO6FWCaqc7EQCT3QnMyOEnLxOuXOs_Ui2B8177D4qqxA@mail.gmail.com>

In past years students made a page on the wiki with their proposal; this
isn't possible anymore due to GitHub permissions. Perhaps an alternative
method for getting feedback should be suggested on the introduction page?

Nelson Liu

On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
gael.varoquaux at normalesup.org> wrote:

> Hi,
>
> Students have been inquiring about the GSoC (Google Summer of Code) with
> scikit-learn, and the core team has been quite silent about team.
>
> I am happy to announce that we will be taking part in the scikit-learn
> again. The reason that we decided to do this is to give a chance to the
> young, talented, and motivated students.
>
> Importantly, our most limiting resource is the time of our experienced
> developers. This is clearly visible from the number of pending pull
> requests. Hence, we need students to be very able and independent. This
> of course means that they will be getting supervision from mentors. Such
> supervision is crucial for moving forward with a good project, that
> delivers mergeable code. However, we will need the students to be very
> good at interacting efficiently with the mentors. Also, I should stress
> that we will be able to take only a very few numbers of students.
>
> With that said, let me introduce the 2017 GSoC for scikit-learn. We have
> set up a wiki page which summarizes the experiences from last year and
> the ideas for this year:
> https://github.com/scikit-learn/scikit-learn/wiki/Google-
> summer-of-code-(GSOC)-2017
>
> Interested students should declare their interest on the mailing list,
> and discuss with possible mentors here. Factors of success will be
>
> * careful work on a good proposal, that takes on of the ideas on the wiki
>   but breaks it down in a realistic plan with multiple steps and shows a
>   good understanding of the problem.
>
> * demonstration of the required skillset via successful pull requests in
>   scikit-learn.
>
> Cheers,
>
> Ga?l
>
>
> --
>     Gael Varoquaux
>     Researcher, INRIA Parietal
>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>     Phone:  ++ 33-1-69-08-79-68
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/3794a382/attachment.html>

From ragvrv at gmail.com  Mon Feb 27 14:28:03 2017
From: ragvrv at gmail.com (Raghav R V)
Date: Mon, 27 Feb 2017 20:28:03 +0100
Subject: [scikit-learn] GSoC 2017
In-Reply-To: <CALoLHMKO6FWCaqc7EQCT3QnMyOEnLxOuXOs_Ui2B8177D4qqxA@mail.gmail.com>
References: <20170227105835.GC2041043@phare.normalesup.org>
 <CALoLHMKO6FWCaqc7EQCT3QnMyOEnLxOuXOs_Ui2B8177D4qqxA@mail.gmail.com>
Message-ID: <CACmxyDF0ztbuTb7gmDzzOe7gWu2wV8bGFyYik6AZ8eP8MywRRw@mail.gmail.com>

They can still edit a wiki page from their fork of scikit learn I think. So
I'd suggest doing that and mailing to this thread, the link to their
proposal...

On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:

> In past years students made a page on the wiki with their proposal; this
> isn't possible anymore due to GitHub permissions. Perhaps an alternative
> method for getting feedback should be suggested on the introduction page?
>
> Nelson Liu
>
> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
> gael.varoquaux at normalesup.org> wrote:
>
>> Hi,
>>
>> Students have been inquiring about the GSoC (Google Summer of Code) with
>> scikit-learn, and the core team has been quite silent about team.
>>
>> I am happy to announce that we will be taking part in the scikit-learn
>> again. The reason that we decided to do this is to give a chance to the
>> young, talented, and motivated students.
>>
>> Importantly, our most limiting resource is the time of our experienced
>> developers. This is clearly visible from the number of pending pull
>> requests. Hence, we need students to be very able and independent. This
>> of course means that they will be getting supervision from mentors. Such
>> supervision is crucial for moving forward with a good project, that
>> delivers mergeable code. However, we will need the students to be very
>> good at interacting efficiently with the mentors. Also, I should stress
>> that we will be able to take only a very few numbers of students.
>>
>> With that said, let me introduce the 2017 GSoC for scikit-learn. We have
>> set up a wiki page which summarizes the experiences from last year and
>> the ideas for this year:
>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
>> mer-of-code-(GSOC)-2017
>>
>> Interested students should declare their interest on the mailing list,
>> and discuss with possible mentors here. Factors of success will be
>>
>> * careful work on a good proposal, that takes on of the ideas on the wiki
>>   but breaks it down in a realistic plan with multiple steps and shows a
>>   good understanding of the problem.
>>
>> * demonstration of the required skillset via successful pull requests in
>>   scikit-learn.
>>
>> Cheers,
>>
>> Ga?l
>>
>>
>> --
>>     Gael Varoquaux
>>     Researcher, INRIA Parietal
>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>     Phone:  ++ 33-1-69-08-79-68
>>     http://gael-varoquaux.info            http://twitter.com/GaelVaroqua
>> ux
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/8f5db7af/attachment.html>

From ragvrv at gmail.com  Mon Feb 27 14:29:06 2017
From: ragvrv at gmail.com (Raghav R V)
Date: Mon, 27 Feb 2017 20:29:06 +0100
Subject: [scikit-learn] GSoC 2017
In-Reply-To: <CACmxyDF0ztbuTb7gmDzzOe7gWu2wV8bGFyYik6AZ8eP8MywRRw@mail.gmail.com>
References: <20170227105835.GC2041043@phare.normalesup.org>
 <CALoLHMKO6FWCaqc7EQCT3QnMyOEnLxOuXOs_Ui2B8177D4qqxA@mail.gmail.com>
 <CACmxyDF0ztbuTb7gmDzzOe7gWu2wV8bGFyYik6AZ8eP8MywRRw@mail.gmail.com>
Message-ID: <CACmxyDFLNavPurc-sCB3NqjaAe+BAbi+aG_JsV7KJv09JyLRbw@mail.gmail.com>

Or simply a public gist and importantly the link mailed here would do I
think...

On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:

> They can still edit a wiki page from their fork of scikit learn I think.
> So I'd suggest doing that and mailing to this thread, the link to their
> proposal...
>
> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>
>> In past years students made a page on the wiki with their proposal; this
>> isn't possible anymore due to GitHub permissions. Perhaps an alternative
>> method for getting feedback should be suggested on the introduction page?
>>
>> Nelson Liu
>>
>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>> gael.varoquaux at normalesup.org> wrote:
>>
>>> Hi,
>>>
>>> Students have been inquiring about the GSoC (Google Summer of Code) with
>>> scikit-learn, and the core team has been quite silent about team.
>>>
>>> I am happy to announce that we will be taking part in the scikit-learn
>>> again. The reason that we decided to do this is to give a chance to the
>>> young, talented, and motivated students.
>>>
>>> Importantly, our most limiting resource is the time of our experienced
>>> developers. This is clearly visible from the number of pending pull
>>> requests. Hence, we need students to be very able and independent. This
>>> of course means that they will be getting supervision from mentors. Such
>>> supervision is crucial for moving forward with a good project, that
>>> delivers mergeable code. However, we will need the students to be very
>>> good at interacting efficiently with the mentors. Also, I should stress
>>> that we will be able to take only a very few numbers of students.
>>>
>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We have
>>> set up a wiki page which summarizes the experiences from last year and
>>> the ideas for this year:
>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
>>> mer-of-code-(GSOC)-2017
>>>
>>> Interested students should declare their interest on the mailing list,
>>> and discuss with possible mentors here. Factors of success will be
>>>
>>> * careful work on a good proposal, that takes on of the ideas on the wiki
>>>   but breaks it down in a realistic plan with multiple steps and shows a
>>>   good understanding of the problem.
>>>
>>> * demonstration of the required skillset via successful pull requests in
>>>   scikit-learn.
>>>
>>> Cheers,
>>>
>>> Ga?l
>>>
>>>
>>> --
>>>     Gael Varoquaux
>>>     Researcher, INRIA Parietal
>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>     Phone:  ++ 33-1-69-08-79-68
>>>     http://gael-varoquaux.info            http://twitter.com/GaelVaroqua
>>> ux
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/6b7050c2/attachment-0001.html>

From thalasta at usc.edu  Mon Feb 27 14:53:30 2017
From: thalasta at usc.edu (Pradeep Thalasta)
Date: Mon, 27 Feb 2017 11:53:30 -0800
Subject: [scikit-learn] GSoC 2017
In-Reply-To: <CACmxyDFLNavPurc-sCB3NqjaAe+BAbi+aG_JsV7KJv09JyLRbw@mail.gmail.com>
References: <20170227105835.GC2041043@phare.normalesup.org>
 <CALoLHMKO6FWCaqc7EQCT3QnMyOEnLxOuXOs_Ui2B8177D4qqxA@mail.gmail.com>
 <CACmxyDF0ztbuTb7gmDzzOe7gWu2wV8bGFyYik6AZ8eP8MywRRw@mail.gmail.com>
 <CACmxyDFLNavPurc-sCB3NqjaAe+BAbi+aG_JsV7KJv09JyLRbw@mail.gmail.com>
Message-ID: <CAEBU=NOkRbQ+_NTkWJuELhT=Cbxe=3yFHAVycKTKpdJ=daUrLw@mail.gmail.com>

Hi,
I'm new to open source contribution. Can i take part in GSoc as well?


On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V <ragvrv at gmail.com> wrote:

> Or simply a public gist and importantly the link mailed here would do I
> think...
>
> On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:
>
>> They can still edit a wiki page from their fork of scikit learn I think.
>> So I'd suggest doing that and mailing to this thread, the link to their
>> proposal...
>>
>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>>
>>> In past years students made a page on the wiki with their proposal; this
>>> isn't possible anymore due to GitHub permissions. Perhaps an alternative
>>> method for getting feedback should be suggested on the introduction page?
>>>
>>> Nelson Liu
>>>
>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>>> gael.varoquaux at normalesup.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Students have been inquiring about the GSoC (Google Summer of Code) with
>>>> scikit-learn, and the core team has been quite silent about team.
>>>>
>>>> I am happy to announce that we will be taking part in the scikit-learn
>>>> again. The reason that we decided to do this is to give a chance to the
>>>> young, talented, and motivated students.
>>>>
>>>> Importantly, our most limiting resource is the time of our experienced
>>>> developers. This is clearly visible from the number of pending pull
>>>> requests. Hence, we need students to be very able and independent. This
>>>> of course means that they will be getting supervision from mentors. Such
>>>> supervision is crucial for moving forward with a good project, that
>>>> delivers mergeable code. However, we will need the students to be very
>>>> good at interacting efficiently with the mentors. Also, I should stress
>>>> that we will be able to take only a very few numbers of students.
>>>>
>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We have
>>>> set up a wiki page which summarizes the experiences from last year and
>>>> the ideas for this year:
>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
>>>> mer-of-code-(GSOC)-2017
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
>>>>
>>>> Interested students should declare their interest on the mailing list,
>>>> and discuss with possible mentors here. Factors of success will be
>>>>
>>>> * careful work on a good proposal, that takes on of the ideas on the
>>>> wiki
>>>>   but breaks it down in a realistic plan with multiple steps and shows a
>>>>   good understanding of the problem.
>>>>
>>>> * demonstration of the required skillset via successful pull requests in
>>>>   scikit-learn.
>>>>
>>>> Cheers,
>>>>
>>>> Ga?l
>>>>
>>>>
>>>> --
>>>>     Gael Varoquaux
>>>>     Researcher, INRIA Parietal
>>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>>     Phone:  ++ 33-1-69-08-79-68
>>>>     http://gael-varoquaux.info
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gael-2Dvaroquaux.info&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=y374tUtv0ORndPBCuIpGPXu3ISMxJcdDrLkeKw9IYC4&e=>
>>>>           http://twitter.com/GaelVaroquaux
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_GaelVaroquaux&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=O3f-GNDNGiA6ri2BTVjJyQMN7z1dXWSmeVsLujo0Tbo&e=>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>
>>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.
> python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=
> clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-
> jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=
> 2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=
>
>


-- 
Regards,
Pradeep Thalasta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/8bbb258c/attachment.html>

From alexandre.gramfort at telecom-paristech.fr  Mon Feb 27 16:20:40 2017
From: alexandre.gramfort at telecom-paristech.fr (Alexandre Gramfort)
Date: Mon, 27 Feb 2017 22:20:40 +0100
Subject: [scikit-learn] Scipy 2017
In-Reply-To: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com>
References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com>
Message-ID: <CADeotZp2JSLYHEdgAG3FaCHZ85OdTJzF278Oz+Fc3jPkBEEa0g@mail.gmail.com>

Hi Andy,

I'll be happy to share the stage with you for a tutorial.

Alex


On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller <t3kcit at gmail.com> wrote:
> Hey folks.
> Who's coming to scipy this year?
> Any volunteers for tutorials? I'm happy to be part of it but doing 7h by
> myself is a bit much ;)
>
>
> Andy
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From ludo25_90 at hotmail.com  Mon Feb 27 17:13:04 2017
From: ludo25_90 at hotmail.com (Ludovico Coletta)
Date: Mon, 27 Feb 2017 22:13:04 +0000
Subject: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29
In-Reply-To: <mailman.1193.1488212854.2611.scikit-learn@python.org>
References: <mailman.1193.1488212854.2611.scikit-learn@python.org>
Message-ID: <BLUPR0301MB201746BE8A4ECEFE2BD4FA068C570@BLUPR0301MB2017.namprd03.prod.outlook.com>

Dear Sebastian,

thank you for the quick answer.

The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1

I saw that I wrote something wrong in previous email. Your solution is indeed correct if we leave Scikit decide how to manage the inner loop. This is what we did at the beginning. By doing so, we noticed that the classifier's perfomance decrease (in comparison to a non-optimised classifier). We would like to control the inner split and we need to store the metrics for each fold

The way we obtained the indices for the optimization, train and test phase is the equivalent of something like that:

rs = ShuffleSplit(n_splits=9, test_size=.25,random_state=42)
indices_for_each_cv = list(rs.split(data[0:11]))

Maybe I can make myself clearer if I write what we would like to achieve for the first cross validation fold (I acknowledge that the previous email was quite a mess, sorry). Outer loop: 48 for training, 20 for testing. Of the 48 training subjects, we would like to use 42 for optimization, 6 for testing the parameters. We got the indices so that we match the different scanners even in the optimization phase, but we are not able to pass them to GridSearch object.

The following did not work. This is what we get --> ValueError: too many values to unpack

ii = 0

while ii < len(cv_final):
# fit and predict

clf = GridSearchCV(
pipeline,
param_grid=param_grid,
verbose=1,
                cv = cv_final_nested[ii], # how to split the 48 train subjects for the optimization
scoring='roc_auc',
n_jobs= -1)

clf.fit(data[cv_final[ii][0]], y[cv_final[ii][0]]) # the train data of the outer loop for the first (i.e. the 48 subjects)
predictions.append(clf.predict(data[cv_final[ii][1]])) # Predict the 20 subjects left out for test in the outer loop

ii = ii + 1

This however works and should be (more or less) what we would like to achieve with the above loop. However, extracting the best parameters for each fold in order to predict the left out data seems impossible or very laborious.

clf = GridSearchCV(
pipeline,
    param_grid=param_grid,
verbose=1,
              cv = cv_final_nested,
scoring='roc_auc',
n_jobs= -1)

clf.fit(data,y)


Any hint on how to solve this problem would be really appreciated.

Best
Ludovico


________________________________
Da: scikit-learn <scikit-learn-bounces+ludo25_90=hotmail.com at python.org> per conto di scikit-learn-request at python.org <scikit-learn-request at python.org>
Inviato: luned? 27 febbraio 2017 17.27
A: scikit-learn at python.org
Oggetto: scikit-learn Digest, Vol 11, Issue 29

Send scikit-learn mailing list submissions to
        scikit-learn at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


or, via email, send a message with subject or body 'help' to
        scikit-learn-request at python.org

You can reach the person managing the list at
        scikit-learn-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. GSoC 2017 (Gael Varoquaux)
   2. Control over the inner loop in GridSearchCV (Ludovico Coletta)
   3. Re: Control over the inner loop in GridSearchCV
      (Sebastian Raschka)


----------------------------------------------------------------------

Message: 1
Date: Mon, 27 Feb 2017 11:58:35 +0100
From: Gael Varoquaux <gael.varoquaux at normalesup.org>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: [scikit-learn] GSoC 2017
Message-ID: <20170227105835.GC2041043 at phare.normalesup.org>
Content-Type: text/plain; charset=iso-8859-1

Hi,

Students have been inquiring about the GSoC (Google Summer of Code) with
scikit-learn, and the core team has been quite silent about team.

I am happy to announce that we will be taking part in the scikit-learn
again. The reason that we decided to do this is to give a chance to the
young, talented, and motivated students.

Importantly, our most limiting resource is the time of our experienced
developers. This is clearly visible from the number of pending pull
requests. Hence, we need students to be very able and independent. This
of course means that they will be getting supervision from mentors. Such
supervision is crucial for moving forward with a good project, that
delivers mergeable code. However, we will need the students to be very
good at interacting efficiently with the mentors. Also, I should stress
that we will be able to take only a very few numbers of students.

With that said, let me introduce the 2017 GSoC for scikit-learn. We have
set up a wiki page which summarizes the experiences from last year and
the ideas for this year:
https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017
Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub<https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017>
github.com
scikit-learn: machine learning in Python


Interested students should declare their interest on the mailing list,
and discuss with possible mentors here. Factors of success will be

* careful work on a good proposal, that takes on of the ideas on the wiki
  but breaks it down in a realistic plan with multiple steps and shows a
  good understanding of the problem.

* demonstration of the required skillset via successful pull requests in
  scikit-learn.

Cheers,

Ga?l


--
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
Gael Varoquaux (@GaelVaroquaux) | Twitter<http://twitter.com/GaelVaroquaux>
twitter.com
The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France


Ga?l Varoquaux: computer / data / brain science<http://gael-varoquaux.info/>
gael-varoquaux.info
Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights


------------------------------

Message: 2
Date: Mon, 27 Feb 2017 14:27:59 +0000
From: Ludovico Coletta <ludo25_90 at hotmail.com>
To: "scikit-learn at python.org" <scikit-learn at python.org>
Subject: [scikit-learn] Control over the inner loop in GridSearchCV
Message-ID:
        <BLUPR0301MB2017606E3E103266BBAB5E698C570 at BLUPR0301MB2017.namprd03.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Dear Scikit experts,


we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will.


We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease.


The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:


pipeline = Pipeline([('scl', StandardScaler()),
                    ('sel', RFE(estimator,step=0.2)),
                                    ('clf', SVC(probability=True, random_state=42))])


param_grid = [{'sel__n_features_to_select':[22,15,10,2],
                           'clf__C': np.logspace(-3, 5, 100),
                   'clf__kernel':['linear']}]

clf = GridSearchCV(pipeline,
                          param_grid=param_grid,
                  verbose=1,
                                  scoring='roc_auc',
                  n_jobs= -1)

# cv_final is the custom cv for the outer loop (9 folds)

ii = 0

while ii < len(cv_final):
# fit and predict

clf.fit(data[?]], y[[?]])
predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
ii = ii + 1


We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.

Two questions:

1) Is there any workaround to avoid the split when clf is called without a cv argument?

2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly


Thank your for your time and sorry for the long text

Ludovico
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/e80777cb/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 27 Feb 2017 11:27:24 -0500
From: Sebastian Raschka <se.raschka at gmail.com>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] Control over the inner loop in
        GridSearchCV
Message-ID: <FC403FD1-9A00-424A-8453-9D60FE176C92 at gmail.com>
Content-Type: text/plain; charset=utf-8

Hi, Ludovico,
what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g.,

skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1)
for outer_train_idx, outer_valid_idx in skfold:
    ?
    gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx])

>
> On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.
> Two questions:

Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.:

outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)

for name, gs_est in sorted(gridcvs.items()):
    nested_score = cross_val_score(gs_est,
    X=X_train,
    y=y_train,
   cv=outer_cv,
   n_jobs=1)


Best,
Sebastian

> On Feb 27, 2017, at 9:27 AM, Ludovico Coletta <ludo25_90 at hotmail.com> wrote:
>
> Dear Scikit experts,
>
> we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will.
>
> We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease.
>
> The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:
>
> pipeline = Pipeline([('scl', StandardScaler()),
>                     ('sel', RFE(estimator,step=0.2)),
>                                     ('clf', SVC(probability=True, random_state=42))])
>
>
> param_grid = [{'sel__n_features_to_select':[22,15,10,2],
>                            'clf__C': np.logspace(-3, 5, 100),
>                    'clf__kernel':['linear']}]
>
> clf = GridSearchCV(pipeline,
>                           param_grid=param_grid,
>                   verbose=1,
>                                   scoring='roc_auc',
>                   n_jobs= -1)
>
> # cv_final is the custom cv for the outer loop (9 folds)
>
> ii = 0
>
> while ii < len(cv_final):
> # fit and predict
>
> clf.fit(data[?]], y[[?]])
> predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
> ii = ii + 1
>
> We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.
> Two questions:
> 1) Is there any workaround to avoid the split when clf is called without a cv argument?
> 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly
>
> Thank your for your time and sorry for the long text
> Ludovico
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


------------------------------

End of scikit-learn Digest, Vol 11, Issue 29
********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/e82970c3/attachment-0001.html>

From gael.varoquaux at normalesup.org  Mon Feb 27 17:19:33 2017
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 27 Feb 2017 23:19:33 +0100
Subject: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29
In-Reply-To: <BLUPR0301MB201746BE8A4ECEFE2BD4FA068C570@BLUPR0301MB2017.namprd03.prod.outlook.com>
References: <mailman.1193.1488212854.2611.scikit-learn@python.org>
 <BLUPR0301MB201746BE8A4ECEFE2BD4FA068C570@BLUPR0301MB2017.namprd03.prod.outlook.com>
Message-ID: <20170227221933.GC2369856@phare.normalesup.org>

On Mon, Feb 27, 2017 at 10:13:04PM +0000, Ludovico Coletta wrote:
> The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1

> I saw that I wrote something wrong in previous email. Your solution is indeed
> correct if we leave Scikit decide how to manage the inner loop. This is what we
> did at the beginning. By doing so, we noticed that the classifier's perfomance
> decrease (in comparison to a non-optimised classifier).

With 68 samples, it is not that surprising the model-selection with
cross-validation is not able to select a good model. We found the same
problem in brain imaging data [1], and it's an intrinsic problem due to
small sample sizes: cross-validation is just not very accurate in these
settings.

Ga?l

[1] https://arxiv.org/abs/1606.05201


From joel.nothman at gmail.com  Mon Feb 27 17:34:43 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Tue, 28 Feb 2017 09:34:43 +1100
Subject: [scikit-learn] GSoC 2017
In-Reply-To: <CAEBU=NOkRbQ+_NTkWJuELhT=Cbxe=3yFHAVycKTKpdJ=daUrLw@mail.gmail.com>
References: <20170227105835.GC2041043@phare.normalesup.org>
 <CALoLHMKO6FWCaqc7EQCT3QnMyOEnLxOuXOs_Ui2B8177D4qqxA@mail.gmail.com>
 <CACmxyDF0ztbuTb7gmDzzOe7gWu2wV8bGFyYik6AZ8eP8MywRRw@mail.gmail.com>
 <CACmxyDFLNavPurc-sCB3NqjaAe+BAbi+aG_JsV7KJv09JyLRbw@mail.gmail.com>
 <CAEBU=NOkRbQ+_NTkWJuELhT=Cbxe=3yFHAVycKTKpdJ=daUrLw@mail.gmail.com>
Message-ID: <CAAkaFLXVGAbBjYE2yA+8egUgRvoYnbOgO9QfYTeG0iF9bWUX+A@mail.gmail.com>

Hi Pradeep, we would usually only accept candidates who have shown their
proficiency and understanding of our package and processes by making some
contributions prior to this stage. you are certainly welcome to aim for
GSoC 2018 by beginning to develop your familiarity and rapport now. cheers,
Joel

On 28 Feb 2017 7:01 am, "Pradeep Thalasta" <thalasta at usc.edu> wrote:

> Hi,
> I'm new to open source contribution. Can i take part in GSoc as well?
>
>
> On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V <ragvrv at gmail.com> wrote:
>
>> Or simply a public gist and importantly the link mailed here would do I
>> think...
>>
>> On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:
>>
>>> They can still edit a wiki page from their fork of scikit learn I think.
>>> So I'd suggest doing that and mailing to this thread, the link to their
>>> proposal...
>>>
>>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>>>
>>>> In past years students made a page on the wiki with their proposal;
>>>> this isn't possible anymore due to GitHub permissions. Perhaps an
>>>> alternative method for getting feedback should be suggested on the
>>>> introduction page?
>>>>
>>>> Nelson Liu
>>>>
>>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>>>> gael.varoquaux at normalesup.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Students have been inquiring about the GSoC (Google Summer of Code)
>>>>> with
>>>>> scikit-learn, and the core team has been quite silent about team.
>>>>>
>>>>> I am happy to announce that we will be taking part in the scikit-learn
>>>>> again. The reason that we decided to do this is to give a chance to the
>>>>> young, talented, and motivated students.
>>>>>
>>>>> Importantly, our most limiting resource is the time of our experienced
>>>>> developers. This is clearly visible from the number of pending pull
>>>>> requests. Hence, we need students to be very able and independent. This
>>>>> of course means that they will be getting supervision from mentors.
>>>>> Such
>>>>> supervision is crucial for moving forward with a good project, that
>>>>> delivers mergeable code. However, we will need the students to be very
>>>>> good at interacting efficiently with the mentors. Also, I should stress
>>>>> that we will be able to take only a very few numbers of students.
>>>>>
>>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We
>>>>> have
>>>>> set up a wiki page which summarizes the experiences from last year and
>>>>> the ideas for this year:
>>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
>>>>> mer-of-code-(GSOC)-2017
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
>>>>>
>>>>> Interested students should declare their interest on the mailing list,
>>>>> and discuss with possible mentors here. Factors of success will be
>>>>>
>>>>> * careful work on a good proposal, that takes on of the ideas on the
>>>>> wiki
>>>>>   but breaks it down in a realistic plan with multiple steps and shows
>>>>> a
>>>>>   good understanding of the problem.
>>>>>
>>>>> * demonstration of the required skillset via successful pull requests
>>>>> in
>>>>>   scikit-learn.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ga?l
>>>>>
>>>>>
>>>>> --
>>>>>     Gael Varoquaux
>>>>>     Researcher, INRIA Parietal
>>>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>>>     Phone:  ++ 33-1-69-08-79-68
>>>>>     http://gael-varoquaux.info
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gael-2Dvaroquaux.info&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=y374tUtv0ORndPBCuIpGPXu3ISMxJcdDrLkeKw9IYC4&e=>
>>>>>           http://twitter.com/GaelVaroquaux
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_GaelVaroquaux&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=O3f-GNDNGiA6ri2BTVjJyQMN7z1dXWSmeVsLujo0Tbo&e=>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>
>>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py
>> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT
>> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg
>> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn
>> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=
>>
>>
>
>
> --
> Regards,
> Pradeep Thalasta
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/220a6d01/attachment.html>

From thalasta at usc.edu  Mon Feb 27 17:46:36 2017
From: thalasta at usc.edu (Pradeep Thalasta)
Date: Mon, 27 Feb 2017 14:46:36 -0800
Subject: [scikit-learn] GSoC 2017
In-Reply-To: <CAAkaFLXVGAbBjYE2yA+8egUgRvoYnbOgO9QfYTeG0iF9bWUX+A@mail.gmail.com>
References: <20170227105835.GC2041043@phare.normalesup.org>
 <CALoLHMKO6FWCaqc7EQCT3QnMyOEnLxOuXOs_Ui2B8177D4qqxA@mail.gmail.com>
 <CACmxyDF0ztbuTb7gmDzzOe7gWu2wV8bGFyYik6AZ8eP8MywRRw@mail.gmail.com>
 <CACmxyDFLNavPurc-sCB3NqjaAe+BAbi+aG_JsV7KJv09JyLRbw@mail.gmail.com>
 <CAEBU=NOkRbQ+_NTkWJuELhT=Cbxe=3yFHAVycKTKpdJ=daUrLw@mail.gmail.com>
 <CAAkaFLXVGAbBjYE2yA+8egUgRvoYnbOgO9QfYTeG0iF9bWUX+A@mail.gmail.com>
Message-ID: <CAEBU=NM-dqum=1AeKd1+gUG+upZ+Z4+jJ-sMHk+Z1doVHkR6JQ@mail.gmail.com>

Thanks Joel, will start with the contribution soon.

On 27 Feb 2017 2:35 pm, "Joel Nothman" <joel.nothman at gmail.com> wrote:

Hi Pradeep, we would usually only accept candidates who have shown their
proficiency and understanding of our package and processes by making some
contributions prior to this stage. you are certainly welcome to aim for
GSoC 2018 by beginning to develop your familiarity and rapport now. cheers,
Joel

On 28 Feb 2017 7:01 am, "Pradeep Thalasta" <thalasta at usc.edu> wrote:

> Hi,
> I'm new to open source contribution. Can i take part in GSoc as well?
>
>
> On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V <ragvrv at gmail.com> wrote:
>
>> Or simply a public gist and importantly the link mailed here would do I
>> think...
>>
>> On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:
>>
>>> They can still edit a wiki page from their fork of scikit learn I think.
>>> So I'd suggest doing that and mailing to this thread, the link to their
>>> proposal...
>>>
>>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>>>
>>>> In past years students made a page on the wiki with their proposal;
>>>> this isn't possible anymore due to GitHub permissions. Perhaps an
>>>> alternative method for getting feedback should be suggested on the
>>>> introduction page?
>>>>
>>>> Nelson Liu
>>>>
>>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>>>> gael.varoquaux at normalesup.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Students have been inquiring about the GSoC (Google Summer of Code)
>>>>> with
>>>>> scikit-learn, and the core team has been quite silent about team.
>>>>>
>>>>> I am happy to announce that we will be taking part in the scikit-learn
>>>>> again. The reason that we decided to do this is to give a chance to the
>>>>> young, talented, and motivated students.
>>>>>
>>>>> Importantly, our most limiting resource is the time of our experienced
>>>>> developers. This is clearly visible from the number of pending pull
>>>>> requests. Hence, we need students to be very able and independent. This
>>>>> of course means that they will be getting supervision from mentors.
>>>>> Such
>>>>> supervision is crucial for moving forward with a good project, that
>>>>> delivers mergeable code. However, we will need the students to be very
>>>>> good at interacting efficiently with the mentors. Also, I should stress
>>>>> that we will be able to take only a very few numbers of students.
>>>>>
>>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We
>>>>> have
>>>>> set up a wiki page which summarizes the experiences from last year and
>>>>> the ideas for this year:
>>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
>>>>> mer-of-code-(GSOC)-2017
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
>>>>>
>>>>> Interested students should declare their interest on the mailing list,
>>>>> and discuss with possible mentors here. Factors of success will be
>>>>>
>>>>> * careful work on a good proposal, that takes on of the ideas on the
>>>>> wiki
>>>>>   but breaks it down in a realistic plan with multiple steps and shows
>>>>> a
>>>>>   good understanding of the problem.
>>>>>
>>>>> * demonstration of the required skillset via successful pull requests
>>>>> in
>>>>>   scikit-learn.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ga?l
>>>>>
>>>>>
>>>>> --
>>>>>     Gael Varoquaux
>>>>>     Researcher, INRIA Parietal
>>>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>>>     Phone:  ++ 33-1-69-08-79-68
>>>>>     http://gael-varoquaux.info
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gael-2Dvaroquaux.info&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=y374tUtv0ORndPBCuIpGPXu3ISMxJcdDrLkeKw9IYC4&e=>
>>>>>           http://twitter.com/GaelVaroquaux
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_GaelVaroquaux&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=O3f-GNDNGiA6ri2BTVjJyQMN7z1dXWSmeVsLujo0Tbo&e=>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>
>>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py
>> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT
>> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg
>> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn
>> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=
>>
>>
>
>
> --
> Regards,
> Pradeep Thalasta
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDMXx9RJj0JlqrKk&e=>
>
>
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.
python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=
SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDM
Xx9RJj0JlqrKk&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/993b2077/attachment-0001.html>

From se.raschka at gmail.com  Mon Feb 27 17:47:02 2017
From: se.raschka at gmail.com (Sebastian Raschka)
Date: Mon, 27 Feb 2017 17:47:02 -0500
Subject: [scikit-learn] Control over the inner loop in GridSearchCV
In-Reply-To: <BLUPR0301MB201746BE8A4ECEFE2BD4FA068C570@BLUPR0301MB2017.namprd03.prod.outlook.com>
References: <mailman.1193.1488212854.2611.scikit-learn@python.org>
 <BLUPR0301MB201746BE8A4ECEFE2BD4FA068C570@BLUPR0301MB2017.namprd03.prod.outlook.com>
Message-ID: <CE5F587A-D08F-4759-AD81-0873E90376F9@gmail.com>

Hi, Ludovico,
my bet is that there is an issue with the format of the object that you pass to the `cv` param of the GridSearchCV. What you need is e.g.,
 "	? An iterable yielding train, test splits.?  

Or more specifically, say you have a generator, my_gen, that is yielding these splits, the way the indices whould be organized would be:

list(my_gen)[0][0] # stores an array of indices used as training fold in the 1st round
# e.g., sth like np.array([ 0,  1,  2,  3,  4,  5,  6,  ?])

list(my_gen)[0][1] # stores an array of indices used as test fold in the 1st round
# e.g., sth like np.array([ 102, 103, 104, 105, 106, 107, 108,  ?])

list(my_gen)[1][0] # stores an array of indices used as training fold in the 2nd round
my_gen[1][1] # stores an array of indices used as test fold in the 2nd round

list(my_gen)[2][0] # stores an array of indices used as training fold in the 3rd round
list(my_gen)[2][1] # stores an array of indices used as test fold in the 3rd round

Hope that helps.

Best,
Sebastian

> The following did not work. This is what we get --> ValueError: too many values to unpack

> On Feb 27, 2017, at 5:13 PM, Ludovico Coletta <ludo25_90 at hotmail.com> wrote:
> 
> Dear Sebastian,
> 
> thank you for the quick answer.
> 
> The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1
> 
> I saw that I wrote something wrong in previous email. Your solution is indeed correct if we leave Scikit decide how to manage the inner loop. This is what we did at the beginning. By doing so, we noticed that the classifier's perfomance decrease (in comparison to a non-optimised classifier). We would like to control the inner split and we need to store the metrics for each fold
> 
> The way we obtained the indices for the optimization, train and test phase is the equivalent of something like that:
> 
> rs = ShuffleSplit(n_splits=9, test_size=.25,random_state=42) 
> indices_for_each_cv = list(rs.split(data[0:11]))
> 
> Maybe I can make myself clearer if I write what we would like to achieve for the first cross validation fold (I acknowledge that the previous email was quite a mess, sorry). Outer loop: 48 for training, 20 for testing. Of the 48 training subjects, we would like to use 42 for optimization, 6 for testing the parameters. We got the indices so that we match the different scanners even in the optimization phase, but we are not able to pass them to GridSearch object. 
> 
> The following did not work. This is what we get --> ValueError: too many values to unpack
> 
> ii = 0
> 
> while ii < len(cv_final):
> # fit and predict
> 
> clf = GridSearchCV(
> pipeline, 
> param_grid=param_grid, 
> verbose=1, 
>                 cv = cv_final_nested[ii], # how to split the 48 train subjects for the optimization
> scoring='roc_auc', 
> n_jobs= -1)
> 
> clf.fit(data[cv_final[ii][0]], y[cv_final[ii][0]]) # the train data of the outer loop for the first (i.e. the 48 subjects)
> predictions.append(clf.predict(data[cv_final[ii][1]])) # Predict the 20 subjects left out for test in the outer loop
> 
> ii = ii + 1
> 
> This however works and should be (more or less) what we would like to achieve with the above loop. However, extracting the best parameters for each fold in order to predict the left out data seems impossible or very laborious.
> 
> clf = GridSearchCV(
> pipeline, 
>     
> param_grid=param_grid, 
> verbose=1, 
>               cv = cv_final_nested,
> scoring='roc_auc', 
> n_jobs= -1)
> 
> clf.fit(data,y)
> 
> 
> Any hint on how to solve this problem would be really appreciated.
> 
> Best
> Ludovico
> 
> 
> 
> 
> Da: scikit-learn <scikit-learn-bounces+ludo25_90=hotmail.com at python.org> per conto di scikit-learn-request at python.org <scikit-learn-request at python.org>
> Inviato: luned? 27 febbraio 2017 17.27
> A: scikit-learn at python.org
> Oggetto: scikit-learn Digest, Vol 11, Issue 29
>  
> Send scikit-learn mailing list submissions to
>         scikit-learn at python.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scikit-learn
> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
> 
> 
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-request at python.org
> 
> You can reach the person managing the list at
>         scikit-learn-owner at python.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
> 
> 
> Today's Topics:
> 
>    1. GSoC 2017 (Gael Varoquaux)
>    2. Control over the inner loop in GridSearchCV (Ludovico Coletta)
>    3. Re: Control over the inner loop in GridSearchCV
>       (Sebastian Raschka)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 27 Feb 2017 11:58:35 +0100
> From: Gael Varoquaux <gael.varoquaux at normalesup.org>
> To: Scikit-learn user and developer mailing list
>         <scikit-learn at python.org>
> Subject: [scikit-learn] GSoC 2017
> Message-ID: <20170227105835.GC2041043 at phare.normalesup.org>
> Content-Type: text/plain; charset=iso-8859-1
> 
> Hi,
> 
> Students have been inquiring about the GSoC (Google Summer of Code) with
> scikit-learn, and the core team has been quite silent about team.
> 
> I am happy to announce that we will be taking part in the scikit-learn
> again. The reason that we decided to do this is to give a chance to the
> young, talented, and motivated students.
> 
> Importantly, our most limiting resource is the time of our experienced
> developers. This is clearly visible from the number of pending pull
> requests. Hence, we need students to be very able and independent. This
> of course means that they will be getting supervision from mentors. Such
> supervision is crucial for moving forward with a good project, that
> delivers mergeable code. However, we will need the students to be very
> good at interacting efficiently with the mentors. Also, I should stress
> that we will be able to take only a very few numbers of students.
> 
> With that said, let me introduce the 2017 GSoC for scikit-learn. We have
> set up a wiki page which summarizes the experiences from last year and
> the ideas for this year:
> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017
> Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub
> github.com
> scikit-learn: machine learning in Python
> 
> 
> 
> Interested students should declare their interest on the mailing list,
> and discuss with possible mentors here. Factors of success will be
> 
> * careful work on a good proposal, that takes on of the ideas on the wiki
>   but breaks it down in a realistic plan with multiple steps and shows a
>   good understanding of the problem.
> 
> * demonstration of the required skillset via successful pull requests in
>   scikit-learn.
> 
> Cheers,
> 
> Ga?l
> 
> 
> -- 
>     Gael Varoquaux
>     Researcher, INRIA Parietal
>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>     Phone:  ++ 33-1-69-08-79-68
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> Gael Varoquaux (@GaelVaroquaux) | Twitter
> twitter.com
> The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France
> 
> Ga?l Varoquaux: computer / data / brain science
> gael-varoquaux.info
> Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 27 Feb 2017 14:27:59 +0000
> From: Ludovico Coletta <ludo25_90 at hotmail.com>
> To: "scikit-learn at python.org" <scikit-learn at python.org>
> Subject: [scikit-learn] Control over the inner loop in GridSearchCV
> Message-ID:
>         <BLUPR0301MB2017606E3E103266BBAB5E698C570 at BLUPR0301MB2017.namprd03.prod.outlook.com>
>         
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Dear Scikit experts,
> 
> 
> we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will.
> 
> 
> We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease.
> 
> 
> The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:
> 
> 
> pipeline = Pipeline([('scl', StandardScaler()),
>                     ('sel', RFE(estimator,step=0.2)),
>                                     ('clf', SVC(probability=True, random_state=42))])
> 
> 
> param_grid = [{'sel__n_features_to_select':[22,15,10,2],
>                            'clf__C': np.logspace(-3, 5, 100),
>                    'clf__kernel':['linear']}]
> 
> clf = GridSearchCV(pipeline,
>                           param_grid=param_grid,
>                   verbose=1,
>                                   scoring='roc_auc',
>                   n_jobs= -1)
> 
> # cv_final is the custom cv for the outer loop (9 folds)
> 
> ii = 0
> 
> while ii < len(cv_final):
> # fit and predict
> 
> clf.fit(data[?]], y[[?]])
> predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
> ii = ii + 1
> 
> 
> We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.
> 
> Two questions:
> 
> 1) Is there any workaround to avoid the split when clf is called without a cv argument?
> 
> 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly
> 
> 
> Thank your for your time and sorry for the long text
> 
> Ludovico
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/e80777cb/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 27 Feb 2017 11:27:24 -0500
> From: Sebastian Raschka <se.raschka at gmail.com>
> To: Scikit-learn user and developer mailing list
>         <scikit-learn at python.org>
> Subject: Re: [scikit-learn] Control over the inner loop in
>         GridSearchCV
> Message-ID: <FC403FD1-9A00-424A-8453-9D60FE176C92 at gmail.com>
> Content-Type: text/plain; charset=utf-8
> 
> Hi, Ludovico,
> what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g.,  
> 
> skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1)
> for outer_train_idx, outer_valid_idx in skfold:
>     ?
>     gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx])
> 
> > 
> > On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.  
> > Two questions: 
> 
> Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.:
> 
> outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)
> 
> for name, gs_est in sorted(gridcvs.items()):
>     nested_score = cross_val_score(gs_est,                 
>     X=X_train,                      
>     y=y_train,                                 
>    cv=outer_cv,                             
>    n_jobs=1)
> 
> 
> Best,
> Sebastian
> 
> > On Feb 27, 2017, at 9:27 AM, Ludovico Coletta <ludo25_90 at hotmail.com> wrote:
> > 
> > Dear Scikit experts,
> > 
> > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. 
> > 
> > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. 
> > 
> > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:
> > 
> > pipeline = Pipeline([('scl', StandardScaler()),
> >                     ('sel', RFE(estimator,step=0.2)),       
> >                                     ('clf', SVC(probability=True, random_state=42))])
> >                      
> >                      
> > param_grid = [{'sel__n_features_to_select':[22,15,10,2],
> >                            'clf__C': np.logspace(-3, 5, 100), 
> >                    'clf__kernel':['linear']}]
> > 
> > clf = GridSearchCV(pipeline, 
> >                           param_grid=param_grid, 
> >                   verbose=1, 
> >                                   scoring='roc_auc', 
> >                   n_jobs= -1)
> > 
> > # cv_final is the custom cv for the outer loop (9 folds)
> > 
> > ii = 0
> > 
> > while ii < len(cv_final):  
> > # fit and predict
> > 
> > clf.fit(data[?]], y[[?]])
> > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
> > ii = ii + 1
> > 
> > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.  
> > Two questions: 
> > 1) Is there any workaround to avoid the split when clf is called without a cv argument? 
> > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly
> > 
> > Thank your for your time and sorry for the long text
> > Ludovico
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
> 
> 
> 
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
> 
> 
> 
> 
> ------------------------------
> 
> End of scikit-learn Digest, Vol 11, Issue 29
> ********************************************
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From ludo25_90 at hotmail.com  Mon Feb 27 18:56:59 2017
From: ludo25_90 at hotmail.com (Ludovico Coletta)
Date: Mon, 27 Feb 2017 23:56:59 +0000
Subject: [scikit-learn] R: scikit-learn Digest, Vol 11, Issue 32
In-Reply-To: <mailman.1307.1488235599.2611.scikit-learn@python.org>
References: <mailman.1307.1488235599.2611.scikit-learn@python.org>
Message-ID: <BLUPR0301MB20178F820111B57E9C4CF95F8C570@BLUPR0301MB2017.namprd03.prod.outlook.com>

Dear Gael,

This will probably be the case here, but we would like to exclude the scanner-factor from the possible explanations. We are still lucky that we are not in situation where the number of features >> number of samples.

Best
Ludovico


-------- Messaggio originale --------
Da: scikit-learn-request at python.org
Data: 27/02/17 23:49 (GMT+01:00)
A: scikit-learn at python.org
Oggetto: scikit-learn Digest, Vol 11, Issue 32

Send scikit-learn mailing list submissions to
        scikit-learn at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
        scikit-learn-request at python.org

You can reach the person managing the list at
        scikit-learn-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. Re: scikit-learn Digest, Vol 11, Issue 29 (Gael Varoquaux)
   2. Re: GSoC 2017 (Joel Nothman)
   3. Re: GSoC 2017 (Pradeep Thalasta)


----------------------------------------------------------------------

Message: 1
Date: Mon, 27 Feb 2017 23:19:33 +0100
From: Gael Varoquaux <gael.varoquaux at normalesup.org>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29
Message-ID: <20170227221933.GC2369856 at phare.normalesup.org>
Content-Type: text/plain; charset=iso-8859-1

On Mon, Feb 27, 2017 at 10:13:04PM +0000, Ludovico Coletta wrote:
> The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1

> I saw that I wrote something wrong in previous email. Your solution is indeed
> correct if we leave Scikit decide how to manage the inner loop. This is what we
> did at the beginning. By doing so, we noticed that the classifier's perfomance
> decrease (in comparison to a non-optimised classifier).

With 68 samples, it is not that surprising the model-selection with
cross-validation is not able to select a good model. We found the same
problem in brain imaging data [1], and it's an intrinsic problem due to
small sample sizes: cross-validation is just not very accurate in these
settings.

Ga?l

[1] https://arxiv.org/abs/1606.05201


------------------------------

Message: 2
Date: Tue, 28 Feb 2017 09:34:43 +1100
From: Joel Nothman <joel.nothman at gmail.com>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] GSoC 2017
Message-ID:
        <CAAkaFLXVGAbBjYE2yA+8egUgRvoYnbOgO9QfYTeG0iF9bWUX+A at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Pradeep, we would usually only accept candidates who have shown their
proficiency and understanding of our package and processes by making some
contributions prior to this stage. you are certainly welcome to aim for
GSoC 2018 by beginning to develop your familiarity and rapport now. cheers,
Joel

On 28 Feb 2017 7:01 am, "Pradeep Thalasta" <thalasta at usc.edu> wrote:

> Hi,
> I'm new to open source contribution. Can i take part in GSoc as well?
>
>
> On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V <ragvrv at gmail.com> wrote:
>
>> Or simply a public gist and importantly the link mailed here would do I
>> think...
>>
>> On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:
>>
>>> They can still edit a wiki page from their fork of scikit learn I think.
>>> So I'd suggest doing that and mailing to this thread, the link to their
>>> proposal...
>>>
>>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>>>
>>>> In past years students made a page on the wiki with their proposal;
>>>> this isn't possible anymore due to GitHub permissions. Perhaps an
>>>> alternative method for getting feedback should be suggested on the
>>>> introduction page?
>>>>
>>>> Nelson Liu
>>>>
>>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>>>> gael.varoquaux at normalesup.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Students have been inquiring about the GSoC (Google Summer of Code)
>>>>> with
>>>>> scikit-learn, and the core team has been quite silent about team.
>>>>>
>>>>> I am happy to announce that we will be taking part in the scikit-learn
>>>>> again. The reason that we decided to do this is to give a chance to the
>>>>> young, talented, and motivated students.
>>>>>
>>>>> Importantly, our most limiting resource is the time of our experienced
>>>>> developers. This is clearly visible from the number of pending pull
>>>>> requests. Hence, we need students to be very able and independent. This
>>>>> of course means that they will be getting supervision from mentors.
>>>>> Such
>>>>> supervision is crucial for moving forward with a good project, that
>>>>> delivers mergeable code. However, we will need the students to be very
>>>>> good at interacting efficiently with the mentors. Also, I should stress
>>>>> that we will be able to take only a very few numbers of students.
>>>>>
>>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We
>>>>> have
>>>>> set up a wiki page which summarizes the experiences from last year and
>>>>> the ideas for this year:
>>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
>>>>> mer-of-code-(GSOC)-2017
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
>>>>>
>>>>> Interested students should declare their interest on the mailing list,
>>>>> and discuss with possible mentors here. Factors of success will be
>>>>>
>>>>> * careful work on a good proposal, that takes on of the ideas on the
>>>>> wiki
>>>>>   but breaks it down in a realistic plan with multiple steps and shows
>>>>> a
>>>>>   good understanding of the problem.
>>>>>
>>>>> * demonstration of the required skillset via successful pull requests
>>>>> in
>>>>>   scikit-learn.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ga?l
>>>>>
>>>>>
>>>>> --
>>>>>     Gael Varoquaux
>>>>>     Researcher, INRIA Parietal
>>>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>>>     Phone:  ++ 33-1-69-08-79-68
>>>>>     http://gael-varoquaux.info
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gael-2Dvaroquaux.info&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=y374tUtv0ORndPBCuIpGPXu3ISMxJcdDrLkeKw9IYC4&e=>
>>>>>           http://twitter.com/GaelVaroquaux
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_GaelVaroquaux&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=O3f-GNDNGiA6ri2BTVjJyQMN7z1dXWSmeVsLujo0Tbo&e=>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>
>>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py
>> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT
>> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg
>> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn
>> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=
>>
>>
>
>
> --
> Regards,
> Pradeep Thalasta
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/220a6d01/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 27 Feb 2017 14:46:36 -0800
From: Pradeep Thalasta <thalasta at usc.edu>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] GSoC 2017
Message-ID:
        <CAEBU=NM-dqum=1AeKd1+gUG+upZ+Z4+jJ-sMHk+Z1doVHkR6JQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thanks Joel, will start with the contribution soon.

On 27 Feb 2017 2:35 pm, "Joel Nothman" <joel.nothman at gmail.com> wrote:

Hi Pradeep, we would usually only accept candidates who have shown their
proficiency and understanding of our package and processes by making some
contributions prior to this stage. you are certainly welcome to aim for
GSoC 2018 by beginning to develop your familiarity and rapport now. cheers,
Joel

On 28 Feb 2017 7:01 am, "Pradeep Thalasta" <thalasta at usc.edu> wrote:

> Hi,
> I'm new to open source contribution. Can i take part in GSoc as well?
>
>
> On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V <ragvrv at gmail.com> wrote:
>
>> Or simply a public gist and importantly the link mailed here would do I
>> think...
>>
>> On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:
>>
>>> They can still edit a wiki page from their fork of scikit learn I think.
>>> So I'd suggest doing that and mailing to this thread, the link to their
>>> proposal...
>>>
>>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>>>
>>>> In past years students made a page on the wiki with their proposal;
>>>> this isn't possible anymore due to GitHub permissions. Perhaps an
>>>> alternative method for getting feedback should be suggested on the
>>>> introduction page?
>>>>
>>>> Nelson Liu
>>>>
>>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>>>> gael.varoquaux at normalesup.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Students have been inquiring about the GSoC (Google Summer of Code)
>>>>> with
>>>>> scikit-learn, and the core team has been quite silent about team.
>>>>>
>>>>> I am happy to announce that we will be taking part in the scikit-learn
>>>>> again. The reason that we decided to do this is to give a chance to the
>>>>> young, talented, and motivated students.
>>>>>
>>>>> Importantly, our most limiting resource is the time of our experienced
>>>>> developers. This is clearly visible from the number of pending pull
>>>>> requests. Hence, we need students to be very able and independent. This
>>>>> of course means that they will be getting supervision from mentors.
>>>>> Such
>>>>> supervision is crucial for moving forward with a good project, that
>>>>> delivers mergeable code. However, we will need the students to be very
>>>>> good at interacting efficiently with the mentors. Also, I should stress
>>>>> that we will be able to take only a very few numbers of students.
>>>>>
>>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We
>>>>> have
>>>>> set up a wiki page which summarizes the experiences from last year and
>>>>> the ideas for this year:
>>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
>>>>> mer-of-code-(GSOC)-2017
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
>>>>>
>>>>> Interested students should declare their interest on the mailing list,
>>>>> and discuss with possible mentors here. Factors of success will be
>>>>>
>>>>> * careful work on a good proposal, that takes on of the ideas on the
>>>>> wiki
>>>>>   but breaks it down in a realistic plan with multiple steps and shows
>>>>> a
>>>>>   good understanding of the problem.
>>>>>
>>>>> * demonstration of the required skillset via successful pull requests
>>>>> in
>>>>>   scikit-learn.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ga?l
>>>>>
>>>>>
>>>>> --
>>>>>     Gael Varoquaux
>>>>>     Researcher, INRIA Parietal
>>>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>>>     Phone:  ++ 33-1-69-08-79-68
>>>>>     http://gael-varoquaux.info
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gael-2Dvaroquaux.info&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=y374tUtv0ORndPBCuIpGPXu3ISMxJcdDrLkeKw9IYC4&e=>
>>>>>           http://twitter.com/GaelVaroquaux
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_GaelVaroquaux&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=O3f-GNDNGiA6ri2BTVjJyQMN7z1dXWSmeVsLujo0Tbo&e=>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>
>>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py
>> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT
>> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg
>> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn
>> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=
>>
>>
>
>
> --
> Regards,
> Pradeep Thalasta
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDMXx9RJj0JlqrKk&e=>
>
>
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.
python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=
SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDM
Xx9RJj0JlqrKk&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/993b2077/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn


------------------------------

End of scikit-learn Digest, Vol 11, Issue 32
********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/c2181d4b/attachment-0001.html>

From ludo25_90 at hotmail.com  Mon Feb 27 19:28:19 2017
From: ludo25_90 at hotmail.com (Ludovico Coletta)
Date: Tue, 28 Feb 2017 00:28:19 +0000
Subject: [scikit-learn] scikit-learn Digest, Vol 11, Issue 33
In-Reply-To: <mailman.1336.1488239823.2611.scikit-learn@python.org>
References: <mailman.1336.1488239823.2611.scikit-learn@python.org>
Message-ID: <BLUPR0301MB201751EE9972FD1964A32A318C560@BLUPR0301MB2017.namprd03.prod.outlook.com>

Dear Sebastian,


this was exactly what we did but is not working.


cv_final[0][0] and cv_final[0][1] hold the training and test indices for the first fold (outer loop), while

cv_final_nested[0][0] and cv_final_nested[0][1] hold the indices for the parameter optimization for the first fold (inner loop, training and test respectively). You are probably right, there must be a (I hope so) little error somewhere. I will try again in the next days.

Thank you for your time
Ludovico
________________________________
Da: scikit-learn <scikit-learn-bounces+ludo25_90=hotmail.com at python.org> per conto di scikit-learn-request at python.org <scikit-learn-request at python.org>
Inviato: marted? 28 febbraio 2017 00.57
A: scikit-learn at python.org
Oggetto: scikit-learn Digest, Vol 11, Issue 33

Send scikit-learn mailing list submissions to
        scikit-learn at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


or, via email, send a message with subject or body 'help' to
        scikit-learn-request at python.org

You can reach the person managing the list at
        scikit-learn-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. Re: Control over the inner loop in GridSearchCV
      (Sebastian Raschka)
   2. R: scikit-learn Digest, Vol 11, Issue 32 (Ludovico Coletta)


----------------------------------------------------------------------

Message: 1
Date: Mon, 27 Feb 2017 17:47:02 -0500
From: Sebastian Raschka <se.raschka at gmail.com>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] Control over the inner loop in
        GridSearchCV
Message-ID: <CE5F587A-D08F-4759-AD81-0873E90376F9 at gmail.com>
Content-Type: text/plain; charset=utf-8

Hi, Ludovico,
my bet is that there is an issue with the format of the object that you pass to the `cv` param of the GridSearchCV. What you need is e.g.,
 "       ? An iterable yielding train, test splits.?

Or more specifically, say you have a generator, my_gen, that is yielding these splits, the way the indices whould be organized would be:

list(my_gen)[0][0] # stores an array of indices used as training fold in the 1st round
# e.g., sth like np.array([ 0,  1,  2,  3,  4,  5,  6,  ?])

list(my_gen)[0][1] # stores an array of indices used as test fold in the 1st round
# e.g., sth like np.array([ 102, 103, 104, 105, 106, 107, 108,  ?])

list(my_gen)[1][0] # stores an array of indices used as training fold in the 2nd round
my_gen[1][1] # stores an array of indices used as test fold in the 2nd round

list(my_gen)[2][0] # stores an array of indices used as training fold in the 3rd round
list(my_gen)[2][1] # stores an array of indices used as test fold in the 3rd round

Hope that helps.

Best,
Sebastian

> The following did not work. This is what we get --> ValueError: too many values to unpack

> On Feb 27, 2017, at 5:13 PM, Ludovico Coletta <ludo25_90 at hotmail.com> wrote:
>
> Dear Sebastian,
>
> thank you for the quick answer.
>
> The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1
>
> I saw that I wrote something wrong in previous email. Your solution is indeed correct if we leave Scikit decide how to manage the inner loop. This is what we did at the beginning. By doing so, we noticed that the classifier's perfomance decrease (in comparison to a non-optimised classifier). We would like to control the inner split and we need to store the metrics for each fold
>
> The way we obtained the indices for the optimization, train and test phase is the equivalent of something like that:
>
> rs = ShuffleSplit(n_splits=9, test_size=.25,random_state=42)
> indices_for_each_cv = list(rs.split(data[0:11]))
>
> Maybe I can make myself clearer if I write what we would like to achieve for the first cross validation fold (I acknowledge that the previous email was quite a mess, sorry). Outer loop: 48 for training, 20 for testing. Of the 48 training subjects, we would like to use 42 for optimization, 6 for testing the parameters. We got the indices so that we match the different scanners even in the optimization phase, but we are not able to pass them to GridSearch object.
>
> The following did not work. This is what we get --> ValueError: too many values to unpack
>
> ii = 0
>
> while ii < len(cv_final):
> # fit and predict
>
> clf = GridSearchCV(
> pipeline,
> param_grid=param_grid,
> verbose=1,
>                 cv = cv_final_nested[ii], # how to split the 48 train subjects for the optimization
> scoring='roc_auc',
> n_jobs= -1)
>
> clf.fit(data[cv_final[ii][0]], y[cv_final[ii][0]]) # the train data of the outer loop for the first (i.e. the 48 subjects)
> predictions.append(clf.predict(data[cv_final[ii][1]])) # Predict the 20 subjects left out for test in the outer loop
>
> ii = ii + 1
>
> This however works and should be (more or less) what we would like to achieve with the above loop. However, extracting the best parameters for each fold in order to predict the left out data seems impossible or very laborious.
>
> clf = GridSearchCV(
> pipeline,
>
> param_grid=param_grid,
> verbose=1,
>               cv = cv_final_nested,
> scoring='roc_auc',
> n_jobs= -1)
>
> clf.fit(data,y)
>
>
> Any hint on how to solve this problem would be really appreciated.
>
> Best
> Ludovico
>
>
>
>
> Da: scikit-learn <scikit-learn-bounces+ludo25_90=hotmail.com at python.org> per conto di scikit-learn-request at python.org <scikit-learn-request at python.org>
> Inviato: luned? 27 febbraio 2017 17.27
> A: scikit-learn at python.org
> Oggetto: scikit-learn Digest, Vol 11, Issue 29
>
> Send scikit-learn mailing list submissions to
>         scikit-learn at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
>
>
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-request at python.org
>
> You can reach the person managing the list at
>         scikit-learn-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
>
>
> Today's Topics:
>
>    1. GSoC 2017 (Gael Varoquaux)
>    2. Control over the inner loop in GridSearchCV (Ludovico Coletta)
>    3. Re: Control over the inner loop in GridSearchCV
>       (Sebastian Raschka)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 27 Feb 2017 11:58:35 +0100
> From: Gael Varoquaux <gael.varoquaux at normalesup.org>
> To: Scikit-learn user and developer mailing list
>         <scikit-learn at python.org>
> Subject: [scikit-learn] GSoC 2017
> Message-ID: <20170227105835.GC2041043 at phare.normalesup.org>
> Content-Type: text/plain; charset=iso-8859-1
>
> Hi,
>
> Students have been inquiring about the GSoC (Google Summer of Code) with
> scikit-learn, and the core team has been quite silent about team.
>
> I am happy to announce that we will be taking part in the scikit-learn
> again. The reason that we decided to do this is to give a chance to the
> young, talented, and motivated students.
>
> Importantly, our most limiting resource is the time of our experienced
> developers. This is clearly visible from the number of pending pull
> requests. Hence, we need students to be very able and independent. This
> of course means that they will be getting supervision from mentors. Such
> supervision is crucial for moving forward with a good project, that
> delivers mergeable code. However, we will need the students to be very
> good at interacting efficiently with the mentors. Also, I should stress
> that we will be able to take only a very few numbers of students.
>
> With that said, let me introduce the 2017 GSoC for scikit-learn. We have
> set up a wiki page which summarizes the experiences from last year and
> the ideas for this year:
> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017
Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub<https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017>
github.com
scikit-learn: machine learning in Python


> Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub
> github.com
> scikit-learn: machine learning in Python
>
>
>
> Interested students should declare their interest on the mailing list,
> and discuss with possible mentors here. Factors of success will be
>
> * careful work on a good proposal, that takes on of the ideas on the wiki
>   but breaks it down in a realistic plan with multiple steps and shows a
>   good understanding of the problem.
>
> * demonstration of the required skillset via successful pull requests in
>   scikit-learn.
>
> Cheers,
>
> Ga?l
>
>
> --
>     Gael Varoquaux
>     Researcher, INRIA Parietal
>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>     Phone:  ++ 33-1-69-08-79-68
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
Gael Varoquaux (@GaelVaroquaux) | Twitter<http://twitter.com/GaelVaroquaux>
twitter.com
The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France


Ga?l Varoquaux: computer / data / brain science<http://gael-varoquaux.info/>
gael-varoquaux.info
Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights


> Gael Varoquaux (@GaelVaroquaux) | Twitter
> twitter.com
> The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France
>
> Ga?l Varoquaux: computer / data / brain science
> gael-varoquaux.info
> Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 27 Feb 2017 14:27:59 +0000
> From: Ludovico Coletta <ludo25_90 at hotmail.com>
> To: "scikit-learn at python.org" <scikit-learn at python.org>
> Subject: [scikit-learn] Control over the inner loop in GridSearchCV
> Message-ID:
>         <BLUPR0301MB2017606E3E103266BBAB5E698C570 at BLUPR0301MB2017.namprd03.prod.outlook.com>
>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear Scikit experts,
>
>
> we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will.
>
>
> We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease.
>
>
> The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:
>
>
> pipeline = Pipeline([('scl', StandardScaler()),
>                     ('sel', RFE(estimator,step=0.2)),
>                                     ('clf', SVC(probability=True, random_state=42))])
>
>
> param_grid = [{'sel__n_features_to_select':[22,15,10,2],
>                            'clf__C': np.logspace(-3, 5, 100),
>                    'clf__kernel':['linear']}]
>
> clf = GridSearchCV(pipeline,
>                           param_grid=param_grid,
>                   verbose=1,
>                                   scoring='roc_auc',
>                   n_jobs= -1)
>
> # cv_final is the custom cv for the outer loop (9 folds)
>
> ii = 0
>
> while ii < len(cv_final):
> # fit and predict
>
> clf.fit(data[?]], y[[?]])
> predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
> ii = ii + 1
>
>
> We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.
>
> Two questions:
>
> 1) Is there any workaround to avoid the split when clf is called without a cv argument?
>
> 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly
>
>
> Thank your for your time and sorry for the long text
>
> Ludovico
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/e80777cb/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 27 Feb 2017 11:27:24 -0500
> From: Sebastian Raschka <se.raschka at gmail.com>
> To: Scikit-learn user and developer mailing list
>         <scikit-learn at python.org>
> Subject: Re: [scikit-learn] Control over the inner loop in
>         GridSearchCV
> Message-ID: <FC403FD1-9A00-424A-8453-9D60FE176C92 at gmail.com>
> Content-Type: text/plain; charset=utf-8
>
> Hi, Ludovico,
> what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g.,
>
> skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1)
> for outer_train_idx, outer_valid_idx in skfold:
>     ?
>     gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx])
>
> >
> > On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.
> > Two questions:
>
> Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.:
>
> outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)
>
> for name, gs_est in sorted(gridcvs.items()):
>     nested_score = cross_val_score(gs_est,
>     X=X_train,
>     y=y_train,
>    cv=outer_cv,
>    n_jobs=1)
>
>
> Best,
> Sebastian
>
> > On Feb 27, 2017, at 9:27 AM, Ludovico Coletta <ludo25_90 at hotmail.com> wrote:
> >
> > Dear Scikit experts,
> >
> > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will.
> >
> > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease.
> >
> > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following:
> >
> > pipeline = Pipeline([('scl', StandardScaler()),
> >                     ('sel', RFE(estimator,step=0.2)),
> >                                     ('clf', SVC(probability=True, random_state=42))])
> >
> >
> > param_grid = [{'sel__n_features_to_select':[22,15,10,2],
> >                            'clf__C': np.logspace(-3, 5, 100),
> >                    'clf__kernel':['linear']}]
> >
> > clf = GridSearchCV(pipeline,
> >                           param_grid=param_grid,
> >                   verbose=1,
> >                                   scoring='roc_auc',
> >                   n_jobs= -1)
> >
> > # cv_final is the custom cv for the outer loop (9 folds)
> >
> > ii = 0
> >
> > while ii < len(cv_final):
> > # fit and predict
> >
> > clf.fit(data[?]], y[[?]])
> > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data
> > ii = ii + 1
> >
> > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error.
> > Two questions:
> > 1) Is there any workaround to avoid the split when clf is called without a cv argument?
> > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a  new dataset is created. Is this true? In this case we only have to adjust the indices accordingly
> >
> > Thank your for your time and sorry for the long text
> > Ludovico
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
>
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
>
>
>
>
> ------------------------------
>
> End of scikit-learn Digest, Vol 11, Issue 29
> ********************************************
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


------------------------------

Message: 2
Date: Mon, 27 Feb 2017 23:56:59 +0000
From: Ludovico Coletta <ludo25_90 at hotmail.com>
To: "scikit-learn at python.org" <scikit-learn at python.org>
Subject: [scikit-learn] R: scikit-learn Digest, Vol 11, Issue 32
Message-ID:
        <BLUPR0301MB20178F820111B57E9C4CF95F8C570 at BLUPR0301MB2017.namprd03.prod.outlook.com>

Content-Type: text/plain; charset="us-ascii"

Dear Gael,

This will probably be the case here, but we would like to exclude the scanner-factor from the possible explanations. We are still lucky that we are not in situation where the number of features >> number of samples.

Best
Ludovico


-------- Messaggio originale --------
Da: scikit-learn-request at python.org
Data: 27/02/17 23:49 (GMT+01:00)
A: scikit-learn at python.org
Oggetto: scikit-learn Digest, Vol 11, Issue 32

Send scikit-learn mailing list submissions to
        scikit-learn at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


or, via email, send a message with subject or body 'help' to
        scikit-learn-request at python.org

You can reach the person managing the list at
        scikit-learn-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. Re: scikit-learn Digest, Vol 11, Issue 29 (Gael Varoquaux)
   2. Re: GSoC 2017 (Joel Nothman)
   3. Re: GSoC 2017 (Pradeep Thalasta)


----------------------------------------------------------------------

Message: 1
Date: Mon, 27 Feb 2017 23:19:33 +0100
From: Gael Varoquaux <gael.varoquaux at normalesup.org>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29
Message-ID: <20170227221933.GC2369856 at phare.normalesup.org>
Content-Type: text/plain; charset=iso-8859-1

On Mon, Feb 27, 2017 at 10:13:04PM +0000, Ludovico Coletta wrote:
> The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1

> I saw that I wrote something wrong in previous email. Your solution is indeed
> correct if we leave Scikit decide how to manage the inner loop. This is what we
> did at the beginning. By doing so, we noticed that the classifier's perfomance
> decrease (in comparison to a non-optimised classifier).

With 68 samples, it is not that surprising the model-selection with
cross-validation is not able to select a good model. We found the same
problem in brain imaging data [1], and it's an intrinsic problem due to
small sample sizes: cross-validation is just not very accurate in these
settings.

Ga?l

[1] https://arxiv.org/abs/1606.05201
[1606.05201] Assessing and tuning brain decoders: cross ...<https://arxiv.org/abs/1606.05201>
arxiv.org
Submission history From: Gael Varoquaux Thu, 16 Jun 2016 14:29:28 GMT (785kb,D) [v2] Mon, 7 Nov 2016 15:40:46 GMT (692kb,D)


------------------------------

Message: 2
Date: Tue, 28 Feb 2017 09:34:43 +1100
From: Joel Nothman <joel.nothman at gmail.com>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] GSoC 2017
Message-ID:
        <CAAkaFLXVGAbBjYE2yA+8egUgRvoYnbOgO9QfYTeG0iF9bWUX+A at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Pradeep, we would usually only accept candidates who have shown their
proficiency and understanding of our package and processes by making some
contributions prior to this stage. you are certainly welcome to aim for
GSoC 2018 by beginning to develop your familiarity and rapport now. cheers,
Joel

On 28 Feb 2017 7:01 am, "Pradeep Thalasta" <thalasta at usc.edu> wrote:

> Hi,
> I'm new to open source contribution. Can i take part in GSoc as well?
>
>
> On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V <ragvrv at gmail.com> wrote:
>
>> Or simply a public gist and importantly the link mailed here would do I
>> think...
>>
>> On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:
>>
>>> They can still edit a wiki page from their fork of scikit learn I think.
>>> So I'd suggest doing that and mailing to this thread, the link to their
>>> proposal...
>>>
>>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>>>
>>>> In past years students made a page on the wiki with their proposal;
>>>> this isn't possible anymore due to GitHub permissions. Perhaps an
>>>> alternative method for getting feedback should be suggested on the
>>>> introduction page?
>>>>
>>>> Nelson Liu
>>>>
>>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>>>> gael.varoquaux at normalesup.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Students have been inquiring about the GSoC (Google Summer of Code)
>>>>> with
>>>>> scikit-learn, and the core team has been quite silent about team.
>>>>>
>>>>> I am happy to announce that we will be taking part in the scikit-learn
>>>>> again. The reason that we decided to do this is to give a chance to the
>>>>> young, talented, and motivated students.
>>>>>
>>>>> Importantly, our most limiting resource is the time of our experienced
>>>>> developers. This is clearly visible from the number of pending pull
>>>>> requests. Hence, we need students to be very able and independent. This
>>>>> of course means that they will be getting supervision from mentors.
>>>>> Such
>>>>> supervision is crucial for moving forward with a good project, that
>>>>> delivers mergeable code. However, we will need the students to be very
>>>>> good at interacting efficiently with the mentors. Also, I should stress
>>>>> that we will be able to take only a very few numbers of students.
>>>>>
>>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We
>>>>> have
>>>>> set up a wiki page which summarizes the experiences from last year and
>>>>> the ideas for this year:
>>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
Home ? scikit-learn/scikit-learn Wiki ? GitHub<https://github.com/scikit-learn/scikit-learn/wiki/Google-sum>
github.com
scikit-learn: machine learning in Python


>>>>> mer-of-code-(GSOC)-2017
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
urldefense.proofpoint.com
scikit-learn: machine learning in Python


>>>>>
>>>>> Interested students should declare their interest on the mailing list,
>>>>> and discuss with possible mentors here. Factors of success will be
>>>>>
>>>>> * careful work on a good proposal, that takes on of the ideas on the
>>>>> wiki
>>>>>   but breaks it down in a realistic plan with multiple steps and shows
>>>>> a
>>>>>   good understanding of the problem.
>>>>>
>>>>> * demonstration of the required skillset via successful pull requests
>>>>> in
>>>>>   scikit-learn.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ga?l
>>>>>
>>>>>
>>>>> --
>>>>>     Gael Varoquaux
>>>>>     Researcher, INRIA Parietal
>>>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>>>     Phone:  ++ 33-1-69-08-79-68
>>>>>     http://gael-varoquaux.info
Ga?l Varoquaux: computer / data / brain science<http://gael-varoquaux.info/>
gael-varoquaux.info
Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights


>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gael-2Dvaroquaux.info&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=y374tUtv0ORndPBCuIpGPXu3ISMxJcdDrLkeKw9IYC4&e=>
>>>>>           http://twitter.com/GaelVaroquaux
Gael Varoquaux (@GaelVaroquaux) | Twitter<http://twitter.com/GaelVaroquaux>
twitter.com
The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France


>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_GaelVaroquaux&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=O3f-GNDNGiA6ri2BTVjJyQMN7z1dXWSmeVsLujo0Tbo&e=>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>
>>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py
>> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT
>> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg
>> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn
>> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=
>>
>>
>
>
> --
> Regards,
> Pradeep Thalasta
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/220a6d01/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 27 Feb 2017 14:46:36 -0800
From: Pradeep Thalasta <thalasta at usc.edu>
To: Scikit-learn user and developer mailing list
        <scikit-learn at python.org>
Subject: Re: [scikit-learn] GSoC 2017
Message-ID:
        <CAEBU=NM-dqum=1AeKd1+gUG+upZ+Z4+jJ-sMHk+Z1doVHkR6JQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thanks Joel, will start with the contribution soon.

On 27 Feb 2017 2:35 pm, "Joel Nothman" <joel.nothman at gmail.com> wrote:

Hi Pradeep, we would usually only accept candidates who have shown their
proficiency and understanding of our package and processes by making some
contributions prior to this stage. you are certainly welcome to aim for
GSoC 2018 by beginning to develop your familiarity and rapport now. cheers,
Joel

On 28 Feb 2017 7:01 am, "Pradeep Thalasta" <thalasta at usc.edu> wrote:

> Hi,
> I'm new to open source contribution. Can i take part in GSoc as well?
>
>
> On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V <ragvrv at gmail.com> wrote:
>
>> Or simply a public gist and importantly the link mailed here would do I
>> think...
>>
>> On 27 Feb 2017 8:28 p.m., "Raghav R V" <ragvrv at gmail.com> wrote:
>>
>>> They can still edit a wiki page from their fork of scikit learn I think.
>>> So I'd suggest doing that and mailing to this thread, the link to their
>>> proposal...
>>>
>>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" <nfliu at uw.edu> wrote:
>>>
>>>> In past years students made a page on the wiki with their proposal;
>>>> this isn't possible anymore due to GitHub permissions. Perhaps an
>>>> alternative method for getting feedback should be suggested on the
>>>> introduction page?
>>>>
>>>> Nelson Liu
>>>>
>>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux <
>>>> gael.varoquaux at normalesup.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Students have been inquiring about the GSoC (Google Summer of Code)
>>>>> with
>>>>> scikit-learn, and the core team has been quite silent about team.
>>>>>
>>>>> I am happy to announce that we will be taking part in the scikit-learn
>>>>> again. The reason that we decided to do this is to give a chance to the
>>>>> young, talented, and motivated students.
>>>>>
>>>>> Importantly, our most limiting resource is the time of our experienced
>>>>> developers. This is clearly visible from the number of pending pull
>>>>> requests. Hence, we need students to be very able and independent. This
>>>>> of course means that they will be getting supervision from mentors.
>>>>> Such
>>>>> supervision is crucial for moving forward with a good project, that
>>>>> delivers mergeable code. However, we will need the students to be very
>>>>> good at interacting efficiently with the mentors. Also, I should stress
>>>>> that we will be able to take only a very few numbers of students.
>>>>>
>>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We
>>>>> have
>>>>> set up a wiki page which summarizes the experiences from last year and
>>>>> the ideas for this year:
>>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum
Home ? scikit-learn/scikit-learn Wiki ? GitHub<https://github.com/scikit-learn/scikit-learn/wiki/Google-sum>
github.com
scikit-learn: machine learning in Python


>>>>> mer-of-code-(GSOC)-2017
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_wiki_Google-2Dsummer-2Dof-2Dcode-2D-28GSOC-29-2D2017&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=H2nCh3txo-r0nse5_lCRDVy6d4ZPbhN07HjOLmxIvaY&e=>
urldefense.proofpoint.com
scikit-learn: machine learning in Python


>>>>>
>>>>> Interested students should declare their interest on the mailing list,
>>>>> and discuss with possible mentors here. Factors of success will be
>>>>>
>>>>> * careful work on a good proposal, that takes on of the ideas on the
>>>>> wiki
>>>>>   but breaks it down in a realistic plan with multiple steps and shows
>>>>> a
>>>>>   good understanding of the problem.
>>>>>
>>>>> * demonstration of the required skillset via successful pull requests
>>>>> in
>>>>>   scikit-learn.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ga?l
>>>>>
>>>>>
>>>>> --
>>>>>     Gael Varoquaux
>>>>>     Researcher, INRIA Parietal
>>>>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>>>>     Phone:  ++ 33-1-69-08-79-68
>>>>>     http://gael-varoquaux.info
Ga?l Varoquaux: computer / data / brain science<http://gael-varoquaux.info/>
gael-varoquaux.info
Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights


>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gael-2Dvaroquaux.info&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=y374tUtv0ORndPBCuIpGPXu3ISMxJcdDrLkeKw9IYC4&e=>
>>>>>           http://twitter.com/GaelVaroquaux
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_GaelVaroquaux&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=O3f-GNDNGiA6ri2BTVjJyQMN7z1dXWSmeVsLujo0Tbo&e=>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=>
>>>>
>>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py
>> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT
>> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg
>> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn
>> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e=
>>
>>
>
>
> --
> Regards,
> Pradeep Thalasta
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDMXx9RJj0JlqrKk&e=>
>
>
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.
python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m=
SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDM
Xx9RJj0JlqrKk&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/993b2077/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


------------------------------

End of scikit-learn Digest, Vol 11, Issue 32
********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/c2181d4b/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...


------------------------------

End of scikit-learn Digest, Vol 11, Issue 33
********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/fee56d4c/attachment-0001.html>

From rovik05 at gmail.com  Mon Feb 27 22:43:05 2017
From: rovik05 at gmail.com (Rohan Koodli)
Date: Mon, 27 Feb 2017 19:43:05 -0800
Subject: [scikit-learn] Clustering 4 dimensional data
Message-ID: <CACNw3io5wMiVOTDbj_TCLB3RvnrFNJPX5oYXYYXJJr=M1D1vKQ@mail.gmail.com>

I'm having trouble understanding how to cluster multidimensional data.
Specifically, a 4 dimensional array.


test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]]

from sklearn import mixture
gmm = mixture.GMM()
gmm.fit(test)

The code returns the following error:

"Found array with dim 4. GMM expected <= 2."

Do I need to change the way my data is formatted? Is there a way of doing
clustering on 4 dimensional data?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/3efe1d74/attachment.html>

From joel.nothman at gmail.com  Mon Feb 27 22:53:02 2017
From: joel.nothman at gmail.com (Joel Nothman)
Date: Tue, 28 Feb 2017 14:53:02 +1100
Subject: [scikit-learn] Clustering 4 dimensional data
In-Reply-To: <CACNw3io5wMiVOTDbj_TCLB3RvnrFNJPX5oYXYYXJJr=M1D1vKQ@mail.gmail.com>
References: <CACNw3io5wMiVOTDbj_TCLB3RvnrFNJPX5oYXYYXJJr=M1D1vKQ@mail.gmail.com>
Message-ID: <CAAkaFLWG3iPVz9hjuqEBZ6XxnOpJGd0qs_uN0pw4iFEWcdHBng@mail.gmail.com>

What do your four dimensions mean? Can you reshape your data such that it
can be seen as a collection of 1d vectors drawn independently from some
distribution?

On 28 February 2017 at 14:43, Rohan Koodli <rovik05 at gmail.com> wrote:

> I'm having trouble understanding how to cluster multidimensional data.
> Specifically, a 4 dimensional array.
>
>
> test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]]
>
> from sklearn import mixture
> gmm = mixture.GMM()
> gmm.fit(test)
>
> The code returns the following error:
>
> "Found array with dim 4. GMM expected <= 2."
>
> Do I need to change the way my data is formatted? Is there a way of doing
> clustering on 4 dimensional data?
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/96667433/attachment.html>

From john_ladasky at sbcglobal.net  Mon Feb 27 23:06:06 2017
From: john_ladasky at sbcglobal.net (John Ladasky)
Date: Mon, 27 Feb 2017 20:06:06 -0800
Subject: [scikit-learn] Clustering 4 dimensional data
In-Reply-To: <CACNw3io5wMiVOTDbj_TCLB3RvnrFNJPX5oYXYYXJJr=M1D1vKQ@mail.gmail.com>
Message-ID: <5e48fccb-4c38-4dde-9235-c0b96126b3e7@email.android.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/72175d99/attachment.html>

From dmitrii.ignatov at gmail.com  Mon Feb 27 23:35:01 2017
From: dmitrii.ignatov at gmail.com (Dmitry Ignatov)
Date: Tue, 28 Feb 2017 07:35:01 +0300
Subject: [scikit-learn] Clustering 4 dimensional data
In-Reply-To: <CAAkaFLWG3iPVz9hjuqEBZ6XxnOpJGd0qs_uN0pw4iFEWcdHBng@mail.gmail.com>
References: <CACNw3io5wMiVOTDbj_TCLB3RvnrFNJPX5oYXYYXJJr=M1D1vKQ@mail.gmail.com>
 <CAAkaFLWG3iPVz9hjuqEBZ6XxnOpJGd0qs_uN0pw4iFEWcdHBng@mail.gmail.com>
Message-ID: <CAKnnxJ3fsiemFCxsNYQT2TR77EOntjusP9Kn_k0NG_4D7n9XLQ@mail.gmail.com>

Sometimes, when you need to find homogeneous subtensors, you can refer to
it as multimodal clustering, an extension of biclustering. I cannot see
clearly whether this is the case here.

28 ????. 2017 ?. 6:54 ???????????? "Joel Nothman" <joel.nothman at gmail.com>
???????:

What do your four dimensions mean? Can you reshape your data such that it
can be seen as a collection of 1d vectors drawn independently from some
distribution?

On 28 February 2017 at 14:43, Rohan Koodli <rovik05 at gmail.com> wrote:

> I'm having trouble understanding how to cluster multidimensional data.
> Specifically, a 4 dimensional array.
>
>
> test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]]
>
> from sklearn import mixture
> gmm = mixture.GMM()
> gmm.fit(test)
>
> The code returns the following error:
>
> "Found array with dim 4. GMM expected <= 2."
>
> Do I need to change the way my data is formatted? Is there a way of doing
> clustering on 4 dimensional data?
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/de02a52c/attachment-0001.html>

From t3kcit at gmail.com  Mon Feb 27 23:50:39 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Mon, 27 Feb 2017 23:50:39 -0500
Subject: [scikit-learn] Women in Machine Learning and Data Science Sprint
 next Weekend (also call for help)
Message-ID: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com>

Hey all.

There's gonna be an introductory scikit-learn sprint at NYC on Saturday 
that a local Women's DS/ML group is organizing with me.
I feel like we could do a bit more to improve (gender) diversity in the 
scipy/pydata space, and so I think this will be cool.

If anyone wants to review code on Saturday that would be a great help 
for people getting started.
Also, if anyone wants to help beforehand, making sure there is enough 
"easy" and "need contributor" issues tagged
is important, as well as ensuring that all the tagged issues actually 
still need contributors.

I'll try to do as much of these as I can but my time is limited these 
days :(

Thanks y'all!

Andy

From jmschreiber91 at gmail.com  Mon Feb 27 23:58:26 2017
From: jmschreiber91 at gmail.com (Jacob Schreiber)
Date: Mon, 27 Feb 2017 20:58:26 -0800
Subject: [scikit-learn] Women in Machine Learning and Data Science
 Sprint next Weekend (also call for help)
In-Reply-To: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com>
References: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com>
Message-ID: <CA+ad8Et-0=wo+4vRnW-qgCZa5z1iL6ykHRRs7fD2oUTtAbf_QQ@mail.gmail.com>

I will try to carve out some time Saturday to review PRs. What time is it
occuring?

On Mon, Feb 27, 2017 at 8:50 PM, Andreas Mueller <t3kcit at gmail.com> wrote:

> Hey all.
>
> There's gonna be an introductory scikit-learn sprint at NYC on Saturday
> that a local Women's DS/ML group is organizing with me.
> I feel like we could do a bit more to improve (gender) diversity in the
> scipy/pydata space, and so I think this will be cool.
>
> If anyone wants to review code on Saturday that would be a great help for
> people getting started.
> Also, if anyone wants to help beforehand, making sure there is enough
> "easy" and "need contributor" issues tagged
> is important, as well as ensuring that all the tagged issues actually
> still need contributors.
>
> I'll try to do as much of these as I can but my time is limited these days
> :(
>
> Thanks y'all!
>
> Andy
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170227/1cc47289/attachment.html>

From amandalmia18 at gmail.com  Tue Feb 28 08:06:36 2017
From: amandalmia18 at gmail.com (Aman Dalmia)
Date: Tue, 28 Feb 2017 18:36:36 +0530
Subject: [scikit-learn] GSoC, 2017 - Parallel Decision Tree Building
Message-ID: <CAJtoGrPMVgtEFg-fup73Zg=Um+zdkesob6qhrqeM2pScK1MaVQ@mail.gmail.com>

Hello everyone,

I am a pre-final year student studying Electronics & Communication
Engineering at IIT Guwahati. I am a member of Prof. Amit Sethi
<http://www.iitg.ernet.in/amitsethi/teaching.html>'s research group where I
work on cancer recurrence prediction using deep learning and have also
started working with Prof. Ashish Anand <http://Prof.%20Ashish%20Anand>,
using NLP for genome sequencing. I want to contribute to scikit-learn
working on the project 'Parallel Decision Tree Building' for GSoC, 2017. I
have been contributing to scikit-learn for the past few weeks working on
issues across different modules. Although I am familiar with the tree
building algorithms, I have not worked a lot on the tree module of
scikit-learn and hence, am I trying to familiarize myself by working on
these issues:

https://github.com/scikit-learn/scikit-learn/issues/4225
https://github.com/scikit-learn/scikit-learn/issues/6557

Please let me know as to what should be the next steps that I need to
follow for building a good proposal.

Thank you,
Aman Dalmia,
Pre-final year student,
Electronics & Communication Engineering,
IIT Guwahati,
+91-8011492025
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/cd0393c9/attachment.html>

From Dale.T.Smith at macys.com  Tue Feb 28 08:08:14 2017
From: Dale.T.Smith at macys.com (Dale T Smith)
Date: Tue, 28 Feb 2017 13:08:14 +0000
Subject: [scikit-learn] Clustering 4 dimensional data
In-Reply-To: <CACNw3io5wMiVOTDbj_TCLB3RvnrFNJPX5oYXYYXJJr=M1D1vKQ@mail.gmail.com>
References: <CACNw3io5wMiVOTDbj_TCLB3RvnrFNJPX5oYXYYXJJr=M1D1vKQ@mail.gmail.com>
Message-ID: <BL2PR06MB2276BFAB69CE63E8208D8460C3560@BL2PR06MB2276.namprd06.prod.outlook.com>

Use whitespace and carriage returns to reformat your data. It?s not clear what you are doing. Also, put it into a Pandas dataframe and make a few plots. The Visualization page is very helpful, along with the Seaborn examples.


____________________________________________________________________
Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com

From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Rohan Koodli
Sent: Monday, February 27, 2017 10:43 PM
To: scikit-learn at python.org
Subject: [scikit-learn] Clustering 4 dimensional data

? EXT MSG:
I'm having trouble understanding how to cluster multidimensional data. Specifically, a 4 dimensional array.


test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]]


from sklearn import mixture

gmm = mixture.GMM()

gmm.fit(test)
The code returns the following error:

"Found array with dim 4. GMM expected <= 2."

Do I need to change the way my data is formatted? Is there a way of doing clustering on 4 dimensional data?
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/ec753fa2/attachment-0001.html>

From ruchika.work at gmail.com  Tue Feb 28 12:37:59 2017
From: ruchika.work at gmail.com (Ruchika Nayyar)
Date: Tue, 28 Feb 2017 10:37:59 -0700
Subject: [scikit-learn] Scipy 2017
In-Reply-To: <CADeotZp2JSLYHEdgAG3FaCHZ85OdTJzF278Oz+Fc3jPkBEEa0g@mail.gmail.com>
References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com>
 <CADeotZp2JSLYHEdgAG3FaCHZ85OdTJzF278Oz+Fc3jPkBEEa0g@mail.gmail.com>
Message-ID: <CAGz0Nphe9HMRx7hex+YamYHaxFtaZsU4qwzKNPcunYKdYNJ82Q@mail.gmail.com>

Hello

Will there be a video link ?

Thanks,
Ruchika
----------------------------------------
Dr Ruchika Nayyar,
Post Doctoral Fellow for ATLAS Collaboration
University of Arizona
Arizona, USA.
--------------------------------------------

On Mon, Feb 27, 2017 at 2:20 PM, Alexandre Gramfort <
alexandre.gramfort at telecom-paristech.fr> wrote:

> Hi Andy,
>
> I'll be happy to share the stage with you for a tutorial.
>
> Alex
>
>
> On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller <t3kcit at gmail.com> wrote:
> > Hey folks.
> > Who's coming to scipy this year?
> > Any volunteers for tutorials? I'm happy to be part of it but doing 7h by
> > myself is a bit much ;)
> >
> >
> > Andy
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/2c1723bb/attachment.html>

From nfliu at uw.edu  Tue Feb 28 12:43:27 2017
From: nfliu at uw.edu (Nelson Liu)
Date: Tue, 28 Feb 2017 09:43:27 -0800
Subject: [scikit-learn] Scipy 2017
In-Reply-To: <CAGz0Nphe9HMRx7hex+YamYHaxFtaZsU4qwzKNPcunYKdYNJ82Q@mail.gmail.com>
References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com>
 <CADeotZp2JSLYHEdgAG3FaCHZ85OdTJzF278Oz+Fc3jPkBEEa0g@mail.gmail.com>
 <CAGz0Nphe9HMRx7hex+YamYHaxFtaZsU4qwzKNPcunYKdYNJ82Q@mail.gmail.com>
Message-ID: <CALoLHMJVc-wDSJ1OxL2zXru29QvUtwe-M0W1YUAQsY6tGnburQ@mail.gmail.com>

The conference generally (at least for the last three years) uploads
recordings of the tutorials afterwards, e.g. here
<https://www.youtube.com/watch?v=OB1reY6IX-o> is part one of the
scikit-learn tutorial at Scipy 2016. I would assume that they are doing
this again.

Nelson Liu

On Tue, Feb 28, 2017 at 9:37 AM, Ruchika Nayyar <ruchika.work at gmail.com>
wrote:

> Hello
>
> Will there be a video link ?
>
> Thanks,
> Ruchika
> ----------------------------------------
> Dr Ruchika Nayyar,
> Post Doctoral Fellow for ATLAS Collaboration
> University of Arizona
> Arizona, USA.
> --------------------------------------------
>
> On Mon, Feb 27, 2017 at 2:20 PM, Alexandre Gramfort <
> alexandre.gramfort at telecom-paristech.fr> wrote:
>
>> Hi Andy,
>>
>> I'll be happy to share the stage with you for a tutorial.
>>
>> Alex
>>
>>
>> On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller <t3kcit at gmail.com>
>> wrote:
>> > Hey folks.
>> > Who's coming to scipy this year?
>> > Any volunteers for tutorials? I'm happy to be part of it but doing 7h by
>> > myself is a bit much ;)
>> >
>> >
>> > Andy
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/b8ecd336/attachment.html>

From jmschreiber91 at gmail.com  Tue Feb 28 14:15:37 2017
From: jmschreiber91 at gmail.com (Jacob Schreiber)
Date: Tue, 28 Feb 2017 11:15:37 -0800
Subject: [scikit-learn] GSoC, 2017 - Parallel Decision Tree Building
In-Reply-To: <CAJtoGrPMVgtEFg-fup73Zg=Um+zdkesob6qhrqeM2pScK1MaVQ@mail.gmail.com>
References: <CAJtoGrPMVgtEFg-fup73Zg=Um+zdkesob6qhrqeM2pScK1MaVQ@mail.gmail.com>
Message-ID: <CA+ad8Eu22pVK9cAtOf=uhi77HzL5KZtDYNx5jv1YPSDoh+LKCw@mail.gmail.com>

Hi Aman

I responded to your other email, but I'm not sure if it actually went
through.

Thanks for your interest in the project, and your current PRs. If you're
looking to apply, you should write a gist which follows the format that
nelson-liu used here: https://github.com/scikit-learn/scikit-learn/
wiki/GSoC-2016-Proposal:-Addition-of-various-enhancements-to-the-tree-
module-by-completing-stalled-pull-requests. The goal of this project is to
parallelize the building of single decision trees, likely by parallelizing
the task of finding the optimal split at each node.

You should put as much detail in as possible for this proposal. As Gael
mentioned in the other thread, the limiting factor for GSoC this year is
mentor time, and the most successful students will be those who can operate
independently. A detailed proposal outlining exactly what needs to be done
will go a long way in showing us that you understand the problem and the
codebase well enough to set achievable goals for the summer. In addition,
we want to ensure that you have the requisite background in python, cython,
parallel processing, and tree building required for the project, so you
should emphasize those skills and previous work you've done which utilize
them.

Let me know if you have any further questions, and I look forward to seeing
your proposal!

Jacob

On Tue, Feb 28, 2017 at 5:06 AM, Aman Dalmia <amandalmia18 at gmail.com> wrote:

> Hello everyone,
>
> I am a pre-final year student studying Electronics & Communication
> Engineering at IIT Guwahati. I am a member of Prof. Amit Sethi
> <http://www.iitg.ernet.in/amitsethi/teaching.html>'s research group where
> I work on cancer recurrence prediction using deep learning and have also
> started working with Prof. Ashish Anand <http://Prof.%20Ashish%20Anand>,
> using NLP for genome sequencing. I want to contribute to scikit-learn
> working on the project 'Parallel Decision Tree Building' for GSoC, 2017. I
> have been contributing to scikit-learn for the past few weeks working on
> issues across different modules. Although I am familiar with the tree
> building algorithms, I have not worked a lot on the tree module of
> scikit-learn and hence, am I trying to familiarize myself by working on
> these issues:
>
> https://github.com/scikit-learn/scikit-learn/issues/4225
> https://github.com/scikit-learn/scikit-learn/issues/6557
>
> Please let me know as to what should be the next steps that I need to
> follow for building a good proposal.
>
> Thank you,
> Aman Dalmia,
> Pre-final year student,
> Electronics & Communication Engineering,
> IIT Guwahati,
> +91-8011492025 <+91%2080114%2092025>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/ffd0b35e/attachment-0001.html>

From t3kcit at gmail.com  Tue Feb 28 19:28:46 2017
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 28 Feb 2017 19:28:46 -0500
Subject: [scikit-learn] Women in Machine Learning and Data Science
 Sprint next Weekend (also call for help)
In-Reply-To: <CA+ad8Et-0=wo+4vRnW-qgCZa5z1iL6ykHRRs7fD2oUTtAbf_QQ@mail.gmail.com>
References: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com>
 <CA+ad8Et-0=wo+4vRnW-qgCZa5z1iL6ykHRRs7fD2oUTtAbf_QQ@mail.gmail.com>
Message-ID: <ed58a35f-4292-1e5b-ca36-04f466d4a97c@gmail.com>

Thanks!
It's gonna be 9:30 till 4, but I'd be surprised if there's a lot going 
on on the issue tracker before 11h with setup etc.
(EST that is).

Andy

On 02/27/2017 11:58 PM, Jacob Schreiber wrote:
> I will try to carve out some time Saturday to review PRs. What time is 
> it occuring?
>
> On Mon, Feb 27, 2017 at 8:50 PM, Andreas Mueller <t3kcit at gmail.com 
> <mailto:t3kcit at gmail.com>> wrote:
>
>     Hey all.
>
>     There's gonna be an introductory scikit-learn sprint at NYC on
>     Saturday that a local Women's DS/ML group is organizing with me.
>     I feel like we could do a bit more to improve (gender) diversity
>     in the scipy/pydata space, and so I think this will be cool.
>
>     If anyone wants to review code on Saturday that would be a great
>     help for people getting started.
>     Also, if anyone wants to help beforehand, making sure there is
>     enough "easy" and "need contributor" issues tagged
>     is important, as well as ensuring that all the tagged issues
>     actually still need contributors.
>
>     I'll try to do as much of these as I can but my time is limited
>     these days :(
>
>     Thanks y'all!
>
>     Andy
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/a9e8b382/attachment.html>

From jmschreiber91 at gmail.com  Tue Feb 28 23:07:42 2017
From: jmschreiber91 at gmail.com (Jacob Schreiber)
Date: Tue, 28 Feb 2017 20:07:42 -0800
Subject: [scikit-learn] Women in Machine Learning and Data Science
 Sprint next Weekend (also call for help)
In-Reply-To: <ed58a35f-4292-1e5b-ca36-04f466d4a97c@gmail.com>
References: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com>
 <CA+ad8Et-0=wo+4vRnW-qgCZa5z1iL6ykHRRs7fD2oUTtAbf_QQ@mail.gmail.com>
 <ed58a35f-4292-1e5b-ca36-04f466d4a97c@gmail.com>
Message-ID: <CA+ad8EtB55KDBz1ePM4=dG=ZqR26bmHxZAZxChRsNg8rtg3Ecg@mail.gmail.com>

Okay. I will be there. Is there going to be a chat channel of some sort to
organize things?

On Tue, Feb 28, 2017 at 4:28 PM, Andreas Mueller <t3kcit at gmail.com> wrote:

> Thanks!
> It's gonna be 9:30 till 4, but I'd be surprised if there's a lot going on
> on the issue tracker before 11h with setup etc.
> (EST that is).
>
> Andy
>
>
> On 02/27/2017 11:58 PM, Jacob Schreiber wrote:
>
> I will try to carve out some time Saturday to review PRs. What time is it
> occuring?
>
> On Mon, Feb 27, 2017 at 8:50 PM, Andreas Mueller <t3kcit at gmail.com> wrote:
>
>> Hey all.
>>
>> There's gonna be an introductory scikit-learn sprint at NYC on Saturday
>> that a local Women's DS/ML group is organizing with me.
>> I feel like we could do a bit more to improve (gender) diversity in the
>> scipy/pydata space, and so I think this will be cool.
>>
>> If anyone wants to review code on Saturday that would be a great help for
>> people getting started.
>> Also, if anyone wants to help beforehand, making sure there is enough
>> "easy" and "need contributor" issues tagged
>> is important, as well as ensuring that all the tagged issues actually
>> still need contributors.
>>
>> I'll try to do as much of these as I can but my time is limited these
>> days :(
>>
>> Thanks y'all!
>>
>> Andy
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170228/5a26d278/attachment.html>

From amandalmia18 at gmail.com  Tue Feb 28 23:46:08 2017
From: amandalmia18 at gmail.com (Aman Dalmia)
Date: Wed, 1 Mar 2017 10:16:08 +0530
Subject: [scikit-learn] GSoC, 2017 - Parallel Decision Tree Building
Message-ID: <CAJtoGrMbtjDb-0o32Ezo_cb4HANZkPwTjXrNRY3nuGEqfBoD2Q@mail.gmail.com>

Hello Sir,

Thank you for your response. You made it very clear for me as to what needs
to be done. I'll have a careful look at the code for the tree module and
would try to start implementing some part of the functionality desired.
I'll get back to you if I get stuck and post the link of my proposal once I
am done with the first draft of it. However, I don't see scikit-learn being
mentioned as on the ideas page for Python Software Foundation -
https://summerofcode.withgoogle.com/organizations/5164886469378048/. Is
there an error?

Thanks,
Aman Dalmia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170301/afdfadb4/attachment.html>