From Afarin.Famili at UTSouthwestern.edu Fri Feb 3 15:53:54 2017 From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili) Date: Fri, 3 Feb 2017 20:53:54 +0000 Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: <1486155234925.50514@UTSouthwestern.edu> Hi all, I am aiming at calculating the p-value of regression models using scikit-learn, in order to report their statistical significance. Aside from permutation_test_score in scikit-learn, do you have any suggestions for calculating the p-value of the model? Ultimately, I am interested in computing the coefficient of determination, r2 as well as MSE to indicate the performance of the model for those models that were statistically significant. Thank you, Afarin? ? ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakevdp at cs.washington.edu Fri Feb 3 16:51:07 2017 From: jakevdp at cs.washington.edu (Jacob Vanderplas) Date: Fri, 3 Feb 2017 13:51:07 -0800 Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn In-Reply-To: <1486155234925.50514@UTSouthwestern.edu> References: <1486155234925.50514@UTSouthwestern.edu> Message-ID: Hi Afarin, The short answer is no, you can't really compute p-values and related statistics in Scikit-Learn. This stems from a fundamental divide in statistics/AI between machine learning on one hand, and statistical modeling on the other. A classic treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo Breiman. In short, statistical modeling is about *estimating parameters of models*, and in that context things like significance, p-values, etc. are relevant. Machine learning is about *predicting outputs*, and generally treats models and their parameters as a black box, the contents of which are not of any explicit interest. As such, p-values and related statistics concerning model parameters are not a concern. Scikit-learn is firmly in the latter camp of Machine learning. Of course, there is plenty of overlap between the two cultures, and the divide is somewhat fuzzy in practice, but it's a useful way to frame the issue. If you're interested in statistical modeling rather than machine learning (and it sounds like you are), scikit-learn is not really the right tool. You might check out the statsmodels package, Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili < Afarin.Famili at utsouthwestern.edu> wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.eickenberg at gmail.com Fri Feb 3 16:54:14 2017 From: michael.eickenberg at gmail.com (Michael Eickenberg) Date: Fri, 3 Feb 2017 22:54:14 +0100 Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn In-Reply-To: <1486155234925.50514@UTSouthwestern.edu> References: <1486155234925.50514@UTSouthwestern.edu> Message-ID: Dear Afarin, scikit-learn is designed for predictive modelling, where evaluation is done out of sample (using train and test sets). You seem to be looking for a package with which you can do classical in-sample statistics and their corresponding evaluations among which p-values. You are probably better off using statsmodels for that or R directly if you don't mind changing languages. Hope that helps! Michael On Friday, 3 February 2017, Afarin Famili wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Fri Feb 3 17:47:47 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Fri, 3 Feb 2017 14:47:47 -0800 Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn In-Reply-To: <1486155234925.50514@UTSouthwestern.edu> References: <1486155234925.50514@UTSouthwestern.edu> Message-ID: The statsmodels package may have more of this kind of thing. http://statsmodels.sourceforge.net/devel/glm.html http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue I assume you're talking about pvalues for a model's parameters, not on the models performance. For the latter, there's various basic stats functions. On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili < Afarin.Famili at utsouthwestern.edu> wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Afarin.Famili at UTSouthwestern.edu Fri Feb 3 18:32:23 2017 From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili) Date: Fri, 3 Feb 2017 23:32:23 +0000 Subject: [scikit-learn] Does permutation_test_score not output the p_value for statistical significance of the model? Re: scikit-learn Digest, Vol 11, Issue 2 In-Reply-To: References: Message-ID: <1486164743283.49517@UTSouthwestern.edu> Thank you all for your answers. I am interested in the statistical significance of the model and not the parameters of the model. I thought "permutation_test_score" from scikit-learn and the p_value it returns, work for the purpose of my work. Am I wrong though? Is this function only used for measuring the statistical significance of classifiers and not regression models? Kind regards, Afarin ________________________________________ From: scikit-learn on behalf of scikit-learn-request at python.org Sent: Friday, February 3, 2017 4:47 PM To: scikit-learn at python.org Subject: scikit-learn Digest, Vol 11, Issue 2 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Calculate p-value, the measure of statistical significance, in scikit-learn (Afarin Famili) 2. Re: Calculate p-value, the measure of statistical significance, in scikit-learn (Jacob Vanderplas) 3. Re: Calculate p-value, the measure of statistical significance, in scikit-learn (Michael Eickenberg) 4. Re: Calculate p-value, the measure of statistical significance, in scikit-learn (Stuart Reynolds) ---------------------------------------------------------------------- Message: 1 Date: Fri, 3 Feb 2017 20:53:54 +0000 From: Afarin Famili To: "scikit-learn at python.org" Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: <1486155234925.50514 at UTSouthwestern.edu> Content-Type: text/plain; charset="iso-8859-1" Hi all, I am aiming at calculating the p-value of regression models using scikit-learn, in order to report their statistical significance. Aside from permutation_test_score in scikit-learn, do you have any suggestions for calculating the p-value of the model? Ultimately, I am interested in computing the coefficient of determination, r2 as well as MSE to indicate the performance of the model for those models that were statistically significant. Thank you, Afarin? ? ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Fri, 3 Feb 2017 13:51:07 -0800 From: Jacob Vanderplas To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: Content-Type: text/plain; charset="utf-8" Hi Afarin, The short answer is no, you can't really compute p-values and related statistics in Scikit-Learn. This stems from a fundamental divide in statistics/AI between machine learning on one hand, and statistical modeling on the other. A classic treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo Breiman. In short, statistical modeling is about *estimating parameters of models*, and in that context things like significance, p-values, etc. are relevant. Machine learning is about *predicting outputs*, and generally treats models and their parameters as a black box, the contents of which are not of any explicit interest. As such, p-values and related statistics concerning model parameters are not a concern. Scikit-learn is firmly in the latter camp of Machine learning. Of course, there is plenty of overlap between the two cultures, and the divide is somewhat fuzzy in practice, but it's a useful way to frame the issue. If you're interested in statistical modeling rather than machine learning (and it sounds like you are), scikit-learn is not really the right tool. You might check out the statsmodels package, Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili < Afarin.Famili at utsouthwestern.edu> wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Fri, 3 Feb 2017 22:54:14 +0100 From: Michael Eickenberg To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: Content-Type: text/plain; charset="utf-8" Dear Afarin, scikit-learn is designed for predictive modelling, where evaluation is done out of sample (using train and test sets). You seem to be looking for a package with which you can do classical in-sample statistics and their corresponding evaluations among which p-values. You are probably better off using statsmodels for that or R directly if you don't mind changing languages. Hope that helps! Michael On Friday, 3 February 2017, Afarin Famili wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 4 Date: Fri, 3 Feb 2017 14:47:47 -0800 From: Stuart Reynolds To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: Content-Type: text/plain; charset="utf-8" The statsmodels package may have more of this kind of thing. http://statsmodels.sourceforge.net/devel/glm.html http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue I assume you're talking about pvalues for a model's parameters, not on the models performance. For the latter, there's various basic stats functions. On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili < Afarin.Famili at utsouthwestern.edu> wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 11, Issue 2 ******************************************* From raga.markely at gmail.com Fri Feb 3 23:18:39 2017 From: raga.markely at gmail.com (Raga Markely) Date: Fri, 3 Feb 2017 23:18:39 -0500 Subject: [scikit-learn] Linear Discriminant Analysis - The priors do not sum to 1. Renormalizing" Message-ID: Hello, I ran LDA for dimensionality reduction, and got the following message on the command prompt (not on the Jupyter Notebook): "The priors do not sum to 1. Renormalizing", UserWarning If I understand correctly, the prior = sum of y bincount/ len(y)? So, does it mean I am getting this message due to some rounding errors? I wonder how I can check if I make any mistake somewhere? Thank you, Raga -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Fri Feb 3 23:36:50 2017 From: raga.markely at gmail.com (Raga Markely) Date: Fri, 3 Feb 2017 23:36:50 -0500 Subject: [scikit-learn] PC Desktop requirement for Machine Learning Message-ID: Hello, I am planning to buy office PC desktop for machine learning work. I wonder if you could provide some recommendation on the computer specs and brand? I don't need cloud capacity, just a standalone, but powerful desktop.. to simplify, let's ignore the price.. i can scale down according to budget as appropriate later.. Just to give a rough ballpark, I ran repeated nested loop (50 outer repeats x 50 inner repeats, ~35 data points, <10 features) with different classification algorithms (Logistic Regressions, KNN, SVC, Kernel SVC, Random Forest) on lightweight office laptop, and as expected, it took a very long time to complete (it finished during the time I left overnight). I would like to be able to complete this in a few mins or less maybe? :D.. so that I can quickly assess and modify the code as necessary .. In the long run, I will also need to do regressions and may use larger data sets (up to 10^4 data points order of magnitude)... I guess this is a very vague question, but I will take any tips and suggestions. Thank you! Raga -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Sat Feb 4 03:23:33 2017 From: ahowe42 at gmail.com (Andrew Howe) Date: Sat, 4 Feb 2017 11:23:33 +0300 Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn In-Reply-To: <1486155234925.50514@UTSouthwestern.edu> References: <1486155234925.50514@UTSouthwestern.edu> Message-ID: I'm fairly certain that the scikit-learn regression result, plus what you already have about the data is enough for you to compute all those statistical measures yourself. It should be rather trivial to do so. Andrew On Feb 4, 2017 00:34, "Afarin Famili" wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alekhka at gmail.com Sat Feb 4 07:45:54 2017 From: alekhka at gmail.com (Alekh Karkada Ashok) Date: Sat, 4 Feb 2017 18:15:54 +0530 Subject: [scikit-learn] 10 years of Scikit-learn Message-ID: Hi all! 2017 marks the 10th year of Scikit-learn (started as a GSoC project in 2007). Can we do anything to celebrate? Perhaps a sticker on the website? or T-shirts commemorating this? Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: From nelle.varoquaux at gmail.com Sat Feb 4 14:52:05 2017 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Sat, 4 Feb 2017 11:52:05 -0800 Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn In-Reply-To: References: <1486155234925.50514@UTSouthwestern.edu> Message-ID: > I'm fairly certain that the scikit-learn regression result, plus what you > already have about the data is enough for you to compute all those > statistical measures yourself. It should be rather trivial to do so. > That is highly dependent on the regression model you use. For example computing a p-value for a lasso regression parameter is not so trivial, though a significance test has recently been proposed. > > Andrew > > On Feb 4, 2017 00:34, "Afarin Famili" > wrote: > >> Hi all, >> >> I am aiming at calculating the p-value of regression models using >> scikit-learn, in order to report their statistical significance. Aside from >> permutation_test_score in scikit-learn, do you have any suggestions for >> calculating the p-value of the model? Ultimately, I am interested in >> computing the coefficient of determination, r2 as well as MSE to indicate >> the performance of the model for those models that were statistically >> significant. >> >> Thank you, >> >> Afarin? >> >> ? >> >> >> >> ------------------------------ >> >> UT Southwestern >> >> Medical Center >> >> The future of medicine, today. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sat Feb 4 16:39:47 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 4 Feb 2017 22:39:47 +0100 Subject: [scikit-learn] 10 years of Scikit-learn In-Reply-To: References: Message-ID: <20170204213947.GE1858410@phare.normalesup.org> Indeed, that a good point. We should mention it in our talks, and maybe in the release notes of next release. Ga?l On Sat, Feb 04, 2017 at 06:15:54PM +0530, Alekh Karkada Ashok wrote: > Hi all! > 2017 marks the 10th year of Scikit-learn (started as a GSoC project in 2007). > Can we do anything to celebrate? Perhaps a sticker on the website? or T-shirts > commemorating this? > Thank you! > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From Afarin.Famili at UTSouthwestern.edu Sat Feb 4 18:43:36 2017 From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili) Date: Sat, 4 Feb 2017 23:43:36 +0000 Subject: [scikit-learn] Permutation-test-score Message-ID: <1486251816290.82720@UTSouthwestern.edu> Hi, Can anyone please tell me what does "permutation_test_score"(and the p_value it returns) do in scikit-learn? I am assuming it outputs the statistical significance of the performance of regression models. I am planning on comparing the performance of various regression models if the performance measure they are reporting is statistically significant. To this end, I wanna output the p-value of the prediction first, and if it was smaller than a certain cut-off, I would then report the performance metrics, such as r2 and MSE. Do p-value and score outputs from "permutation-test-score" not provide me with what I want? Afarin ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Sun Feb 5 00:15:18 2017 From: ahowe42 at gmail.com (Andrew Howe) Date: Sun, 5 Feb 2017 08:15:18 +0300 Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn In-Reply-To: References: <1486155234925.50514@UTSouthwestern.edu> Message-ID: Yep - in which case the OP would have difficulty computing p-values (but not the other usual stats) with any software tool that provided those methods. But since the question was specifically about scikit-learn, my main point is that the quantities are easy to compute (if they exist). Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Sat, Feb 4, 2017 at 10:52 PM, Nelle Varoquaux wrote: > > I'm fairly certain that the scikit-learn regression result, plus what you >> already have about the data is enough for you to compute all those >> statistical measures yourself. It should be rather trivial to do so. >> > > That is highly dependent on the regression model you use. For example > computing a p-value for a lasso regression parameter is not so trivial, > though a significance test has recently been proposed. > > >> >> Andrew >> >> On Feb 4, 2017 00:34, "Afarin Famili" >> wrote: >> >>> Hi all, >>> >>> I am aiming at calculating the p-value of regression models using >>> scikit-learn, in order to report their statistical significance. Aside from >>> permutation_test_score in scikit-learn, do you have any suggestions for >>> calculating the p-value of the model? Ultimately, I am interested in >>> computing the coefficient of determination, r2 as well as MSE to indicate >>> the performance of the model for those models that were statistically >>> significant. >>> >>> Thank you, >>> >>> Afarin? >>> >>> ? >>> >>> >>> >>> ------------------------------ >>> >>> UT Southwestern >>> >>> Medical Center >>> >>> The future of medicine, today. >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Sun Feb 5 04:44:01 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Sun, 5 Feb 2017 10:44:01 +0100 Subject: [scikit-learn] Permutation-test-score In-Reply-To: <1486251816290.82720@UTSouthwestern.edu> References: <1486251816290.82720@UTSouthwestern.edu> Message-ID: This is non-parametric (aka brute force) way to check that a model has a predictive performance significantly higher than chance. For models with 90% accuracy this is useless as we already know for sure that the model is better than predicting at random. This method is only useful if you have very little data or very noisy data and you are not even sure that your predictive method is able to pick anything predictive from the data. E.g. you have a balanced binary classification problem with ~52% accuracy. It proceeds as follows: it first does a single cross-validation round with the true label to compute a reference score. Then it does the same 100 times but each time with independently randomly permuted variants of the labels (the y array). Then it returns the fraction of the time the reference CV score was higher than the CV scores of the models trained and evaluated with permuted labels. Here is an example: http://scikit-learn.org/stable/auto_examples/feature_selection/plot_permutation_test_for_classification.html Note that you should not use than method to select the best model from a collection of possible models and then report its permutation test p-value without correcting for multiple comparisons. -- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From nixnmtm at gmail.com Tue Feb 7 09:26:09 2017 From: nixnmtm at gmail.com (Nixon Raj) Date: Tue, 7 Feb 2017 22:26:09 +0800 Subject: [scikit-learn] Need Corresponding indices array of values in each split of a DesicisionTreeClassifier Message-ID: For Example, In the below decision tree dot file, I have 223 samples which splits into [174, 49] in the first split and [110, 1] in the 2nd split I would like to get the array of indices for the values of each split like *[174, 49] and their corresponding indices (idx) like [[0, 1 ,5, 7,....,200,221], [3, 4, 6, ....., 199,222,223]]* *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221], [7]]* Please help me node [shape=box] ; 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174, 49]"] ; 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110, 1]"] ; 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ; 1 -> 2 ; 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ; 1 -> 3 ; 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64, 48]"] ; 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48, 15]"] ; 4 -> 5 ; 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ; 5 -> 6 ; 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46, 6]"] ; 5 -> 7 ; 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ; 7 -> 8 ; 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28, 6]"] ; 7 -> 9 ; 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 9 -> 10 ; 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28, 5]"] ; 9 -> 11 ; 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ; 11 -> 12 ; 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16, 5]"] ; 11 -> 13 ; 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ; 13 -> 14 ; 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ; 13 -> 15 ; 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16, 33]"] ; 4 -> 16 ; 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ; 16 -> 17 ; 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12, 31]"] ; 16 -> 18 ; 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11, 31]"] ; 18 -> 19 ; 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11, 26]"] ; 19 -> 20 ; 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4, 19]"] ; 20 -> 21 ; 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ; 21 -> 22 ; 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ; 21 -> 23 ; 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ; 20 -> 24 ; 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ; 19 -> 25 ; 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 18 -> 26 ; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Feb 7 18:21:16 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 8 Feb 2017 10:21:16 +1100 Subject: [scikit-learn] Need Corresponding indices array of values in each split of a DesicisionTreeClassifier In-Reply-To: References: Message-ID: I don't think putting that array of indices in a visualisation is a great idea! If you use my_tree.apply(X) you will be able to determine which leaf each instance in X lands up at, and potentially trace up the tree from there. On 8 February 2017 at 01:26, Nixon Raj wrote: > > For Example, In the below decision tree dot file, I have 223 samples which > splits into [174, 49] in the first split and [110, 1] in the 2nd split > > I would like to get the array of indices for the values of each split like > > *[174, 49] and their corresponding indices (idx) like [[0, 1 ,5, > 7,....,200,221], [3, 4, 6, ....., 199,222,223]]* > > *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221], > [7]]* > > Please help me > > node [shape=box] ; > 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174, > 49]"] ; > 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110, > 1]"] ; > 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; > 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ; > 1 -> 2 ; > 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ; > 1 -> 3 ; > 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64, > 48]"] ; > 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; > 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48, > 15]"] ; > 4 -> 5 ; > 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ; > 5 -> 6 ; > 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46, > 6]"] ; > 5 -> 7 ; > 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ; > 7 -> 8 ; > 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28, > 6]"] ; > 7 -> 9 ; > 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ; > 9 -> 10 ; > 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28, > 5]"] ; > 9 -> 11 ; > 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ; > 11 -> 12 ; > 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16, > 5]"] ; > 11 -> 13 ; > 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ; > 13 -> 14 ; > 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ; > 13 -> 15 ; > 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16, > 33]"] ; > 4 -> 16 ; > 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ; > 16 -> 17 ; > 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12, > 31]"] ; > 16 -> 18 ; > 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11, > 31]"] ; > 18 -> 19 ; > 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11, > 26]"] ; > 19 -> 20 ; > 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4, > 19]"] ; > 20 -> 21 ; > 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ; > 21 -> 22 ; > 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ; > 21 -> 23 ; > 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ; > 20 -> 24 ; > 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ; > 19 -> 25 ; > 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ; > 18 -> 26 ; > } > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jblackburne at gmail.com Tue Feb 7 19:13:40 2017 From: jblackburne at gmail.com (Jeff Blackburne) Date: Tue, 7 Feb 2017 16:13:40 -0800 Subject: [scikit-learn] Need Corresponding indices array of values in each split of a DesicisionTreeClassifier In-Reply-To: References: Message-ID: Nixon, If you are using version 0.18 or later, you can reconstruct the information you need using the `decision_path` method: http://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html -Jeff On Tue, Feb 7, 2017 at 3:21 PM, Joel Nothman wrote: > I don't think putting that array of indices in a visualisation is a great > idea! > > If you use my_tree.apply(X) you will be able to determine which leaf each > instance in X lands up at, and potentially trace up the tree from there. > > On 8 February 2017 at 01:26, Nixon Raj wrote: > >> >> For Example, In the below decision tree dot file, I have 223 samples >> which splits into [174, 49] in the first split and [110, 1] in the 2nd split >> >> I would like to get the array of indices for the values of each split >> like >> >> *[174, 49] and their corresponding indices (idx) like [[0, 1 ,5, >> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]* >> >> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221], >> [7]]* >> >> Please help me >> >> node [shape=box] ; >> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = [174, >> 49]"] ; >> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110, >> 1]"] ; >> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; >> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ; >> 1 -> 2 ; >> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ; >> 1 -> 3 ; >> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64, >> 48]"] ; >> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; >> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48, >> 15]"] ; >> 4 -> 5 ; >> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ; >> 5 -> 6 ; >> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46, >> 6]"] ; >> 5 -> 7 ; >> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ; >> 7 -> 8 ; >> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28, >> 6]"] ; >> 7 -> 9 ; >> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ; >> 9 -> 10 ; >> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28, >> 5]"] ; >> 9 -> 11 ; >> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ; >> 11 -> 12 ; >> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16, >> 5]"] ; >> 11 -> 13 ; >> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ; >> 13 -> 14 ; >> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ; >> 13 -> 15 ; >> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16, >> 33]"] ; >> 4 -> 16 ; >> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ; >> 16 -> 17 ; >> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12, >> 31]"] ; >> 16 -> 18 ; >> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11, >> 31]"] ; >> 18 -> 19 ; >> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11, >> 26]"] ; >> 19 -> 20 ; >> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4, >> 19]"] ; >> 20 -> 21 ; >> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ; >> 21 -> 22 ; >> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ; >> 21 -> 23 ; >> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ; >> 20 -> 24 ; >> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ; >> 19 -> 25 ; >> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ; >> 18 -> 26 ; >> } >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Feb 7 21:00:12 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 7 Feb 2017 21:00:12 -0500 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: <20170111215115.GO1585067@phare.normalesup.org> References: <20170109151546.GM2802991@phare.normalesup.org> <20170111215115.GO1585067@phare.normalesup.org> Message-ID: On 12 January 2017 at 08:51, Gael Varoquaux wrote: > On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote: > > When the two versions deprecation policy was instituted, releases were > much > > more frequent... Is that enough of an excuse? > > I'd rather say that we can here decide that we are giving a longer grace > period. > > I think that slow deprecations are a good things (see titus's blog post > here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html ) > Given that 0.18 was a very slow release, and the work for removing deprecated material from 0.19 has already been done, I don't think we should revert that. I agree that we can delay the deprecation deadline for 0.20 and 0.21. In terms of release schedule, are we aiming for RC in early-mid March, assuming Andy's above prognostications are correct and he is able to review in a bigger way in a week or so? J -------------- next part -------------- An HTML attachment was scrubbed... URL: From nixnmtm at gmail.com Wed Feb 8 04:43:17 2017 From: nixnmtm at gmail.com (Nixon Raj) Date: Wed, 8 Feb 2017 17:43:17 +0800 Subject: [scikit-learn] Need Corresponding indices array of values in each split of a DesicisionTreeClassifier In-Reply-To: References: Message-ID: Hi Joel andJeff Thanks for your valuable comment, i got that to work On 8 February 2017 at 08:13, Jeff Blackburne wrote: > Nixon, > > If you are using version 0.18 or later, you can reconstruct the > information you need using the `decision_path` method: > > http://scikit-learn.org/stable/auto_examples/tree/ > plot_unveil_tree_structure.html > > -Jeff > > > On Tue, Feb 7, 2017 at 3:21 PM, Joel Nothman > wrote: > >> I don't think putting that array of indices in a visualisation is a great >> idea! >> >> If you use my_tree.apply(X) you will be able to determine which leaf each >> instance in X lands up at, and potentially trace up the tree from there. >> >> On 8 February 2017 at 01:26, Nixon Raj wrote: >> >>> >>> For Example, In the below decision tree dot file, I have 223 samples >>> which splits into [174, 49] in the first split and [110, 1] in the 2nd split >>> >>> I would like to get the array of indices for the values of each split >>> like >>> >>> *[174, 49] and their corresponding indices (idx) like [[0, 1 ,5, >>> 7,....,200,221], [3, 4, 6, ....., 199,222,223]]* >>> >>> *[110, 1] and their corresponding indices (idx) like [[0,5,....200,221], >>> [7]]* >>> >>> Please help me >>> >>> node [shape=box] ; >>> 0 [label="X[0] <= 13.9191\nentropy = 0.7597\nsamples = 223\nvalue = >>> [174, 49]"] ; >>> 1 [label="X[1] <= 3.1973\nentropy = 0.0741\nsamples = 111\nvalue = [110, >>> 1]"] ; >>> 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; >>> 2 [label="entropy = 0.0\nsamples = 109\nvalue = [109, 0]"] ; >>> 1 -> 2 ; >>> 3 [label="entropy = 1.0\nsamples = 2\nvalue = [1, 1]"] ; >>> 1 -> 3 ; >>> 4 [label="X[1] <= 3.1266\nentropy = 0.9852\nsamples = 112\nvalue = [64, >>> 48]"] ; >>> 0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; >>> 5 [label="X[2] <= -0.4882\nentropy = 0.7919\nsamples = 63\nvalue = [48, >>> 15]"] ; >>> 4 -> 5 ; >>> 6 [label="entropy = 0.684\nsamples = 11\nvalue = [2, 9]"] ; >>> 5 -> 6 ; >>> 7 [label="X[2] <= 0.5422\nentropy = 0.5159\nsamples = 52\nvalue = [46, >>> 6]"] ; >>> 5 -> 7 ; >>> 8 [label="entropy = 0.0\nsamples = 18\nvalue = [18, 0]"] ; >>> 7 -> 8 ; >>> 9 [label="X[2] <= 0.6497\nentropy = 0.6723\nsamples = 34\nvalue = [28, >>> 6]"] ; >>> 7 -> 9 ; >>> 10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1]"] ; >>> 9 -> 10 ; >>> 11 [label="X[2] <= 1.887\nentropy = 0.6136\nsamples = 33\nvalue = [28, >>> 5]"] ; >>> 9 -> 11 ; >>> 12 [label="entropy = 0.0\nsamples = 12\nvalue = [12, 0]"] ; >>> 11 -> 12 ; >>> 13 [label="X[2] <= 2.6691\nentropy = 0.7919\nsamples = 21\nvalue = [16, >>> 5]"] ; >>> 11 -> 13 ; >>> 14 [label="entropy = 0.8113\nsamples = 4\nvalue = [1, 3]"] ; >>> 13 -> 14 ; >>> 15 [label="entropy = 0.5226\nsamples = 17\nvalue = [15, 2]"] ; >>> 13 -> 15 ; >>> 16 [label="X[0] <= 17.3284\nentropy = 0.9113\nsamples = 49\nvalue = [16, >>> 33]"] ; >>> 4 -> 16 ; >>> 17 [label="entropy = 0.9183\nsamples = 6\nvalue = [4, 2]"] ; >>> 16 -> 17 ; >>> 18 [label="X[2] <= 19.7048\nentropy = 0.8542\nsamples = 43\nvalue = [12, >>> 31]"] ; >>> 16 -> 18 ; >>> 19 [label="X[2] <= 5.8511\nentropy = 0.8296\nsamples = 42\nvalue = [11, >>> 31]"] ; >>> 18 -> 19 ; >>> 20 [label="X[0] <= 31.8916\nentropy = 0.878\nsamples = 37\nvalue = [11, >>> 26]"] ; >>> 19 -> 20 ; >>> 21 [label="X[1] <= 3.3612\nentropy = 0.6666\nsamples = 23\nvalue = [4, >>> 19]"] ; >>> 20 -> 21 ; >>> 22 [label="entropy = 0.8905\nsamples = 13\nvalue = [4, 9]"] ; >>> 21 -> 22 ; >>> 23 [label="entropy = 0.0\nsamples = 10\nvalue = [0, 10]"] ; >>> 21 -> 23 ; >>> 24 [label="entropy = 1.0\nsamples = 14\nvalue = [7, 7]"] ; >>> 20 -> 24 ; >>> 25 [label="entropy = 0.0\nsamples = 5\nvalue = [0, 5]"] ; >>> 19 -> 25 ; >>> 26 [label="entropy = 0.0\nsamples = 1\nvalue = [1, 0]"] ; >>> 18 -> 26 ; >>> } >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Regards Nixon Raj N Department of Biological Science and Technology Institute of Bioinformatics and Systems Biology National Chiao Tung University 208 Lab Building 1, 75 Bo-Ai St. Dong District, Hsinchu, Taiwan 30062 (R.O.C.) Mob:+886-989353921 0ffice ext: 56997 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Wed Feb 8 12:15:44 2017 From: ahowe42 at gmail.com (Andrew Howe) Date: Wed, 8 Feb 2017 20:15:44 +0300 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References: <20170109151546.GM2802991@phare.normalesup.org> <20170111215115.GO1585067@phare.normalesup.org> Message-ID: How many current deprecations are expected in the next release? Andrew On Jan 12, 2017 00:53, "Gael Varoquaux" wrote: On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote: > When the two versions deprecation policy was instituted, releases were much > more frequent... Is that enough of an excuse? I'd rather say that we can here decide that we are giving a longer grace period. I think that slow deprecations are a good things (see titus's blog post here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html ) G > On 12 January 2017 at 03:43, Andreas Mueller wrote: > On 01/09/2017 10:15 AM, Gael Varoquaux wrote: > instead of setting up a roadmap I would rather just identify bugs > that > are blockers and fix only those and don't wait for any feature > before > cutting 0.19.X. > I agree with the sentiment, but this would mess with our deprecation cycle. > If we release now, and then release again soonish, that means people have > less calendar time > to react to deprecations. > We could either accept this or change all deprecations and bump the removal > by a version? > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Feb 8 22:30:40 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 9 Feb 2017 14:30:40 +1100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References: <20170109151546.GM2802991@phare.normalesup.org> <20170111215115.GO1585067@phare.normalesup.org> Message-ID: Not sure that this quite gives you a number, but: $git checkout 0.18.1 $ git grep -pwB1 0.19 sklearn | grep -ve ^- -e .csv: -e /tests/ > /tmp/dep19.txt etc. edited results attached. On 9 February 2017 at 04:15, Andrew Howe wrote: > How many current deprecations are expected in the next release? > > Andrew > > On Jan 12, 2017 00:53, "Gael Varoquaux" > wrote: > > On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote: > > When the two versions deprecation policy was instituted, releases were > much > > more frequent... Is that enough of an excuse? > > I'd rather say that we can here decide that we are giving a longer grace > period. > > I think that slow deprecations are a good things (see titus's blog post > here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html ) > > G > > > On 12 January 2017 at 03:43, Andreas Mueller wrote: > > > > > On 01/09/2017 10:15 AM, Gael Varoquaux wrote: > > > instead of setting up a roadmap I would rather just identify > bugs > > that > > are blockers and fix only those and don't wait for any > feature > > before > > cutting 0.19.X. > > > > > I agree with the sentiment, but this would mess with our deprecation > cycle. > > If we release now, and then release again soonish, that means people > have > > less calendar time > > to react to deprecations. > > > We could either accept this or change all deprecations and bump the > removal > > by a version? > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- sklearn/base.py=from . import __version__ sklearn/base.py- at deprecated("ChangedBehaviorWarning has been moved into the sklearn.exceptions" sklearn/base.py: " module. It will not be available here from version 0.19") sklearn/datasets/data/boston_house_prices.csv-1.62864,0,21.89,0,0.624,5.019,100,1.4394,4,437,21.2,396.9,34.41,14.4 sklearn/datasets/data/boston_house_prices.csv-0.40202,0,9.9,0,0.544,6.382,67.2,3.5325,4,304,18.4,395.21,10.36,23.1 sklearn/datasets/data/breast_cancer.csv-14.71,21.59,95.55,656.9,0.1137,0.1365,0.1293,0.08123,0.2027,0.06758,0.4226,1.15,2.735,40.09,0.003659,0.02855,0.02572,0.01272,0.01817,0.004108,17.87,30.7,115.7,985.5,0.1368,0.429,0.3587,0.1834,0.3698,0.1094,0 sklearn/datasets/data/breast_cancer.csv-20.26,23.03,132.4,1264,0.09078,0.1313,0.1465,0.08683,0.2095,0.05649,0.7576,1.509,4.554,87.87,0.006016,0.03482,0.04232,0.01269,0.02657,0.004411,24.22,31.59,156.1,1750,0.119,0.3539,0.4098,0.1573,0.3689,0.08368,0 sklearn/datasets/data/breast_cancer.csv-12.86,13.32,82.82,504.8,0.1134,0.08834,0.038,0.034,0.1543,0.06476,0.2212,1.042,1.614,16.57,0.00591,0.02016,0.01902,0.01011,0.01202,0.003107,14.04,21.08,92.8,599.5,0.1547,0.2231,0.1791,0.1155,0.2382,0.08553,1 sklearn/datasets/data/breast_cancer.csv-11.87,21.54,76.83,432,0.06613,0.1064,0.08777,0.02386,0.1349,0.06612,0.256,1.554,1.955,20.24,0.006854,0.06063,0.06663,0.01553,0.02354,0.008925,12.79,28.18,83.51,507.2,0.09457,0.3399,0.3218,0.0875,0.2305,0.09952,1 sklearn/datasets/data/breast_cancer.csv-13,25.13,82.61,520.2,0.08369,0.05073,0.01206,0.01762,0.1667,0.05449,0.2621,1.232,1.657,21.19,0.006054,0.008974,0.005681,0.006336,0.01215,0.001514,14.34,31.88,91.06,628.5,0.1218,0.1093,0.04462,0.05921,0.2306,0.06291,1 sklearn/datasets/lfw.py=def _fetch_lfw_pairs(index_file_path, data_folder_path, slice_=None, sklearn/datasets/lfw.py- at deprecated("Function 'load_lfw_people' has been deprecated in 0.17 and will " sklearn/datasets/lfw.py: "be removed in 0.19." sklearn/datasets/lfw.py=def load_lfw_people(download_if_missing=False, **kwargs): sklearn/datasets/lfw.py- .. deprecated:: 0.17 sklearn/datasets/lfw.py: This function will be removed in 0.19. sklearn/datasets/lfw.py=def fetch_lfw_pairs(subset='train', data_home=None, funneled=True, resize=0.5, sklearn/datasets/lfw.py- at deprecated("Function 'load_lfw_pairs' has been deprecated in 0.17 and will " sklearn/datasets/lfw.py: "be removed in 0.19." sklearn/datasets/lfw.py=def load_lfw_pairs(download_if_missing=False, **kwargs): sklearn/datasets/lfw.py- .. deprecated:: 0.17 sklearn/datasets/lfw.py: This function will be removed in 0.19. sklearn/decomposition/nmf.py=def non_negative_factorization(X, W=None, H=None, n_components=None, sklearn/decomposition/nmf.py- if solver == 'pg': sklearn/decomposition/nmf.py: warnings.warn("'pg' solver will be removed in release 0.19." sklearn/decomposition/nmf.py=class NMF(BaseEstimator, TransformerMixin): sklearn/decomposition/nmf.py- " for 'pg' solver, which will be removed" sklearn/decomposition/nmf.py: " in release 0.19. Use another solver with L1 or L2" sklearn/decomposition/nmf.py- sklearn/decomposition/nmf.py:@deprecated("It will be removed in release 0.19. Use NMF instead." sklearn/decomposition/nmf.py: "'pg' solver is still available until release 0.19.") sklearn/discriminant_analysis.py=class LinearDiscriminantAnalysis(BaseEstimator, LinearClassifierMixin, sklearn/discriminant_analysis.py- warnings.warn("The parameter 'store_covariance' is deprecated as " sklearn/discriminant_analysis.py: "of version 0.17 and will be removed in 0.19. The " sklearn/discriminant_analysis.py- warnings.warn("The parameter 'tol' is deprecated as of version " sklearn/discriminant_analysis.py: "0.17 and will be removed in 0.19. The parameter is " sklearn/discriminant_analysis.py=class QuadraticDiscriminantAnalysis(BaseEstimator, ClassifierMixin): sklearn/discriminant_analysis.py- warnings.warn("The parameter 'store_covariances' is deprecated as " sklearn/discriminant_analysis.py: "of version 0.17 and will be removed in 0.19. The " sklearn/discriminant_analysis.py- warnings.warn("The parameter 'tol' is deprecated as of version " sklearn/discriminant_analysis.py: "0.17 and will be removed in 0.19. The parameter is " sklearn/ensemble/forest.py=class ForestClassifier(six.with_metaclass(ABCMeta, BaseForest, sklearn/ensemble/forest.py- warn("class_weight='subsample' is deprecated in 0.17 and" sklearn/ensemble/forest.py: "will be removed in 0.19. It was replaced by " sklearn/ensemble/gradient_boosting.py=class BaseGradientBoosting(six.with_metaclass(ABCMeta, BaseEnsemble, sklearn/ensemble/gradient_boosting.py- sklearn/ensemble/gradient_boosting.py: @deprecated(" and will be removed in 0.19") sklearn/ensemble/gradient_boosting.py- sklearn/ensemble/gradient_boosting.py: @deprecated(" and will be removed in 0.19") sklearn/feature_selection/from_model.py=class _LearntSelectorMixin(TransformerMixin): sklearn/feature_selection/from_model.py- @deprecated('Support to use estimators as feature selectors will be ' sklearn/feature_selection/from_model.py: 'removed in version 0.19. Use SelectFromModel instead.') sklearn/lda.py=warnings.warn("lda.LDA has been moved to " sklearn/lda.py- "discriminant_analysis.LinearDiscriminantAnalysis " sklearn/lda.py: "in 0.17 and will be removed in 0.19", DeprecationWarning) sklearn/lda.py=class LDA(_LDA): sklearn/lda.py- .. deprecated:: 0.17 sklearn/lda.py: This class will be removed in 0.19. sklearn/linear_model/base.py=class LinearModel(six.with_metaclass(ABCMeta, BaseEstimator)): sklearn/linear_model/base.py- sklearn/linear_model/base.py: @deprecated(" and will be removed in 0.19.") sklearn/linear_model/base.py=class LinearRegression(LinearModel, RegressorMixin): sklearn/linear_model/base.py- @property sklearn/linear_model/base.py: @deprecated("``residues_`` is deprecated and will be removed in 0.19") sklearn/linear_model/coordinate_descent.py=class ElasticNet(LinearModel, RegressorMixin): sklearn/linear_model/coordinate_descent.py- sklearn/linear_model/coordinate_descent.py: @deprecated(" and will be removed in 0.19") sklearn/linear_model/logistic.py=def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True, sklearn/linear_model/logistic.py- Whether or not to produce a copy of the data. A copy is not required sklearn/linear_model/logistic.py: anymore. This parameter is deprecated and will be removed in 0.19. sklearn/linear_model/logistic.py- warnings.warn("A copy is not required anymore. The 'copy' parameter " sklearn/linear_model/logistic.py: "is deprecated and will be removed in 0.19.", sklearn/linear_model/logistic.py- sklearn/linear_model/logistic.py: # 'auto' is deprecated and will be removed in 0.19 sklearn/linear_model/logistic.py=class LogisticRegressionCV(LogisticRegression, BaseEstimator, sklearn/linear_model/logistic.py- class_weight in ['balanced', 'auto']): sklearn/linear_model/logistic.py: # 'auto' is deprecated and will be removed in 0.19 sklearn/linear_model/stochastic_gradient.py=class BaseSGDRegressor(BaseSGD, RegressorMixin): sklearn/linear_model/stochastic_gradient.py- sklearn/linear_model/stochastic_gradient.py: @deprecated(" and will be removed in 0.19.") sklearn/metrics/base.py=from ..utils import deprecated sklearn/metrics/base.py- at deprecated("UndefinedMetricWarning has been moved into the sklearn.exceptions" sklearn/metrics/base.py: " module. It will not be available here from version 0.19") sklearn/metrics/regression.py=def r2_score(y_true, y_pred, sklearn/metrics/regression.py- deprecated since version 0.17 and will be changed to 'uniform_average' sklearn/metrics/regression.py: starting from 0.19. sklearn/metrics/regression.py- "0.17, it will be changed to 'uniform_average' " sklearn/metrics/regression.py: "starting from 0.19.", sklearn/multioutput.py=class MultiOutputRegressor(MultiOutputEstimator, RegressorMixin): sklearn/multioutput.py- """ sklearn/multioutput.py: # XXX remove in 0.19 when r2_score default for multioutput changes sklearn/pipeline.py=class Pipeline(_BasePipeline): sklearn/pipeline.py- if hasattr(X, 'ndim') and X.ndim == 1: sklearn/pipeline.py: warn("From version 0.19, a 1d X will not be reshaped in" sklearn/preprocessing/data.py=DEPRECATION_MSG_1D = ( sklearn/preprocessing/data.py- "Passing 1d arrays as data is deprecated in 0.17 and will " sklearn/preprocessing/data.py: "raise ValueError in 0.19. Reshape your data either using " sklearn/preprocessing/data.py=class MinMaxScaler(BaseEstimator, TransformerMixin): sklearn/preprocessing/data.py- @deprecated("Attribute data_range will be removed in " sklearn/preprocessing/data.py: "0.19. Use ``data_range_`` instead") sklearn/preprocessing/data.py- @deprecated("Attribute data_min will be removed in " sklearn/preprocessing/data.py: "0.19. Use ``data_min_`` instead") sklearn/preprocessing/data.py=class StandardScaler(BaseEstimator, TransformerMixin): sklearn/preprocessing/data.py- @property sklearn/preprocessing/data.py: @deprecated("Attribute ``std_`` will be removed in 0.19. " sklearn/qda.py=warnings.warn("qda.QDA has been moved to " sklearn/qda.py- "discriminant_analysis.QuadraticDiscriminantAnalysis " sklearn/qda.py: "in 0.17 and will be removed in 0.19.", DeprecationWarning) sklearn/qda.py=class QDA(_QDA): sklearn/qda.py- .. deprecated:: 0.17 sklearn/qda.py: This class will be removed in 0.19. sklearn/svm/base.py=class BaseLibSVM(six.with_metaclass(ABCMeta, BaseEstimator)): sklearn/svm/base.py- sklearn/svm/base.py: @deprecated(" and will be removed in 0.19") sklearn/svm/base.py=class BaseSVC(six.with_metaclass(ABCMeta, BaseLibSVM, ClassifierMixin)): sklearn/svm/base.py- warnings.warn("The decision_function_shape default value will " sklearn/svm/base.py: "change from 'ovo' to 'ovr' in 0.19. This will change " sklearn/svm/classes.py=class SVC(BaseSVC): sklearn/svm/classes.py- compatibility and raise a deprecation warning, but will change 'ovr' sklearn/svm/classes.py: in 0.19. sklearn/svm/classes.py=class NuSVC(BaseSVC): sklearn/svm/classes.py- compatibility and raise a deprecation warning, but will change 'ovr' sklearn/svm/classes.py: in 0.19. sklearn/utils/__init__.py=from ..exceptions import DataConversionWarning sklearn/utils/__init__.py- at deprecated("ConvergenceWarning has been moved into the sklearn.exceptions " sklearn/utils/__init__.py: "module. It will not be available here from version 0.19") sklearn/utils/class_weight.py=def compute_class_weight(class_weight, classes, y): sklearn/utils/class_weight.py- "class_weight='balanced'. 'auto' will be removed in" sklearn/utils/class_weight.py: " 0.19", DeprecationWarning) sklearn/utils/estimator_checks.py=MULTI_OUTPUT = ['CCA', 'DecisionTreeRegressor', 'ElasticNet', sklearn/utils/estimator_checks.py- sklearn/utils/estimator_checks.py:# Estimators with deprecated transform methods. Should be removed in 0.19 when sklearn/utils/testing.py=def if_not_mac_os(versions=('10.7', '10.8', '10.9'), sklearn/utils/testing.py- warnings.warn("if_not_mac_os is deprecated in 0.17 and will be removed" sklearn/utils/testing.py: " in 0.19: use the safer and more generic" sklearn/utils/validation.py=from ..exceptions import NotFittedError as _NotFittedError sklearn/utils/validation.py- at deprecated("DataConversionWarning has been moved into the sklearn.exceptions" sklearn/utils/validation.py: " module. It will not be available here from version 0.19") sklearn/utils/validation.py=class DataConversionWarning(_DataConversionWarning): sklearn/utils/validation.py- at deprecated("NonBLASDotWarning has been moved into the sklearn.exceptions" sklearn/utils/validation.py: " module. It will not be available here from version 0.19") sklearn/utils/validation.py=class NonBLASDotWarning(_NonBLASDotWarning): sklearn/utils/validation.py- at deprecated("NotFittedError has been moved into the sklearn.exceptions module." sklearn/utils/validation.py: " It will not be available here from version 0.19") sklearn/utils/validation.py=def check_array(array, accept_sparse=None, dtype="numeric", order=None, sklearn/utils/validation.py- "Passing 1d arrays as data is deprecated in 0.17 and will " sklearn/utils/validation.py: "raise ValueError in 0.19. Reshape your data either using " sklearn/utils/validation.py=def check_is_fitted(estimator, attributes, msg=None, all_or_any=all): sklearn/utils/validation.py- if not all_or_any([hasattr(estimator, attr) for attr in attributes]): sklearn/utils/validation.py: # FIXME NotFittedError_ --> NotFittedError in 0.19 -------------- next part -------------- sklearn/base.py=def clone(estimator, safe=True): sklearn/base.py- " This behavior is deprecated as of 0.18 and " sklearn/base.py: "support for this behavior will be removed in 0.20." sklearn/cross_validation.py=warnings.warn("This module was deprecated in version 0.18 in favor of the " sklearn/cross_validation.py- "new CV iterators are different from that of this module. " sklearn/cross_validation.py: "This module will be removed in 0.20.", DeprecationWarning) sklearn/cross_validation.py=class LeaveOneOut(_PartitionIterator): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class LeavePOut(_PartitionIterator): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class KFold(_BaseKFold): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class LabelKFold(_BaseKFold): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class StratifiedKFold(_BaseKFold): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class LeaveOneLabelOut(_PartitionIterator): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class LeavePLabelOut(_PartitionIterator): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class ShuffleSplit(BaseShuffleSplit): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class StratifiedShuffleSplit(BaseShuffleSplit): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class PredefinedSplit(_PartitionIterator): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=class LabelShuffleSplit(ShuffleSplit): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=def cross_val_predict(estimator, X, y=None, cv=None, n_jobs=1, sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1, sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=def check_cv(cv, X=None, y=None, classifier=False): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=def permutation_test_score(estimator, X, y, cv=None, sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/cross_validation.py=def train_test_split(*arrays, **options): sklearn/cross_validation.py- .. deprecated:: 0.18 sklearn/cross_validation.py: This module will be removed in 0.20. sklearn/decomposition/online_lda.py=class LatentDirichletAllocation(BaseEstimator, TransformerMixin): sklearn/decomposition/online_lda.py- faster than the batch update. sklearn/decomposition/online_lda.py: The default learning method is going to be changed to 'batch' in the 0.20 release. sklearn/decomposition/online_lda.py- warnings.warn("The default value for 'learning_method' will be " sklearn/decomposition/online_lda.py: "changed from 'online' to 'batch' in the release 0.20. " sklearn/decomposition/pca.py=class PCA(_BasePCA): sklearn/decomposition/pca.py- sklearn/decomposition/pca.py:@deprecated("RandomizedPCA was deprecated in 0.18 and will be removed in 0.20. " sklearn/decomposition/pca.py=class RandomizedPCA(BaseEstimator, TransformerMixin): sklearn/decomposition/pca.py- .. deprecated:: 0.18 sklearn/decomposition/pca.py: This class will be removed in 0.20. sklearn/gaussian_process/gaussian_process.py=MACHINE_EPSILON = np.finfo(np.double).eps sklearn/gaussian_process/gaussian_process.py- at deprecated("l1_cross_distances was deprecated in version 0.18 " sklearn/gaussian_process/gaussian_process.py: "and will be removed in 0.20.") sklearn/gaussian_process/gaussian_process.py=def l1_cross_distances(X): sklearn/gaussian_process/gaussian_process.py- at deprecated("GaussianProcess was deprecated in version 0.18 and will be " sklearn/gaussian_process/gaussian_process.py: "removed in 0.20. Use the GaussianProcessRegressor instead.") sklearn/gaussian_process/gaussian_process.py=class GaussianProcess(BaseEstimator, RegressorMixin): sklearn/gaussian_process/gaussian_process.py- .. deprecated:: 0.18 sklearn/gaussian_process/gaussian_process.py: This class will be removed in 0.20. sklearn/grid_search.py=warnings.warn("This module was deprecated in version 0.18 in favor of the " sklearn/grid_search.py- "model_selection module into which all the refactored classes " sklearn/grid_search.py: "and functions are moved. This module will be removed in 0.20.", sklearn/grid_search.py=class ParameterGrid(object): sklearn/grid_search.py- .. deprecated:: 0.18 sklearn/grid_search.py: This module will be removed in 0.20. sklearn/grid_search.py=class ParameterSampler(object): sklearn/grid_search.py- .. deprecated:: 0.18 sklearn/grid_search.py: This module will be removed in 0.20. sklearn/grid_search.py=def fit_grid_point(X, y, estimator, parameters, train, test, scorer, sklearn/grid_search.py- .. deprecated:: 0.18 sklearn/grid_search.py: This module will be removed in 0.20. sklearn/grid_search.py=class GridSearchCV(BaseSearchCV): sklearn/grid_search.py- .. deprecated:: 0.18 sklearn/grid_search.py: This module will be removed in 0.20. sklearn/grid_search.py=class RandomizedSearchCV(BaseSearchCV): sklearn/grid_search.py- .. deprecated:: 0.18 sklearn/grid_search.py: This module will be removed in 0.20. sklearn/isotonic.py=class IsotonicRegression(BaseEstimator, TransformerMixin, RegressorMixin): sklearn/isotonic.py- @deprecated("Attribute ``X_`` is deprecated in version 0.18 and will be" sklearn/isotonic.py: " removed in version 0.20.") sklearn/isotonic.py- @deprecated("Attribute ``y_`` is deprecated in version 0.18 and will" sklearn/isotonic.py: " be removed in version 0.20.") sklearn/learning_curve.py=warnings.warn("This module was deprecated in version 0.18 in favor of the " sklearn/learning_curve.py- "model_selection module into which all the functions are moved." sklearn/learning_curve.py: " This module will be removed in 0.20", sklearn/learning_curve.py=def learning_curve(estimator, X, y, train_sizes=np.linspace(0.1, 1.0, 5), sklearn/learning_curve.py- .. deprecated:: 0.18 sklearn/learning_curve.py: This module will be removed in 0.20. sklearn/learning_curve.py=def validation_curve(estimator, X, y, param_name, param_range, cv=None, sklearn/learning_curve.py- .. deprecated:: 0.18 sklearn/learning_curve.py: This module will be removed in 0.20. sklearn/linear_model/base.py=def make_dataset(X, y, sample_weight, random_state=None): sklearn/linear_model/base.py- at deprecated("sparse_center_data was deprecated in version 0.18 and will be " sklearn/linear_model/base.py: "removed in 0.20. Use utilities in preprocessing.data instead") sklearn/linear_model/base.py=def sparse_center_data(X, y, fit_intercept, normalize=False): sklearn/linear_model/base.py- at deprecated("center_data was deprecated in version 0.18 and will be removed in " sklearn/linear_model/base.py: "0.20. Use utilities in preprocessing.data instead") sklearn/linear_model/ransac.py=class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin): sklearn/linear_model/ransac.py- sklearn/linear_model/ransac.py: NOTE: residual_metric is deprecated from 0.18 and will be removed in 0.20 sklearn/linear_model/ransac.py- "'residual_metric' was deprecated in version 0.18 and " sklearn/linear_model/ransac.py: "will be removed in version 0.20. Use 'loss' instead.", sklearn/linear_model/ransac.py- sklearn/linear_model/ransac.py: # XXX: Deprecation: Remove this if block in 0.20 sklearn/metrics/classification.py=def hamming_loss(y_true, y_pred, labels=None, sample_weight=None, sklearn/metrics/classification.py- (deprecated) Integer array of labels. This parameter has been sklearn/metrics/classification.py: renamed to ``labels`` in version 0.18 and will be removed in 0.20. sklearn/metrics/classification.py- warnings.warn("'classes' was renamed to 'labels' in version 0.18 and " sklearn/metrics/classification.py: "will be removed in 0.20.", DeprecationWarning) sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method mean_squared_error was renamed to ' sklearn/metrics/scorer.py- 'neg_mean_squared_error in version 0.18 and will ' sklearn/metrics/scorer.py: 'be removed in 0.20.') sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method mean_absolute_error was renamed to ' sklearn/metrics/scorer.py- 'neg_mean_absolute_error in version 0.18 and will ' sklearn/metrics/scorer.py: 'be removed in 0.20.') sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method median_absolute_error was renamed to ' sklearn/metrics/scorer.py- 'neg_median_absolute_error in version 0.18 and will ' sklearn/metrics/scorer.py: 'be removed in 0.20.') sklearn/metrics/scorer.py=deprecation_msg = ('Scoring method log_loss was renamed to ' sklearn/metrics/scorer.py: 'neg_log_loss in version 0.18 and will be removed in 0.20.') sklearn/mixture/dpgmm.py=from __future__ import print_function sklearn/mixture/dpgmm.py- sklearn/mixture/dpgmm.py:# Important note for the deprecation cleaning of 0.20 : sklearn/mixture/dpgmm.py=from .gmm import _GMMBase sklearn/mixture/dpgmm.py- at deprecated("The function digamma is deprecated in 0.18 and " sklearn/mixture/dpgmm.py: "will be removed in 0.20. Use scipy.special.digamma instead.") sklearn/mixture/dpgmm.py=def digamma(x): sklearn/mixture/dpgmm.py- at deprecated("The function gammaln is deprecated in 0.18 and " sklearn/mixture/dpgmm.py: "will be removed in 0.20. Use scipy.special.gammaln instead.") sklearn/mixture/dpgmm.py=def gammaln(x): sklearn/mixture/dpgmm.py- at deprecated("The function log_normalize is deprecated in 0.18 and " sklearn/mixture/dpgmm.py: "will be removed in 0.20.") sklearn/mixture/dpgmm.py=def log_normalize(v, axis=0): sklearn/mixture/dpgmm.py- at deprecated("The function wishart_log_det is deprecated in 0.18 and " sklearn/mixture/dpgmm.py: "will be removed in 0.20.") sklearn/mixture/dpgmm.py=def wishart_log_det(a, b, detB, n_features): sklearn/mixture/dpgmm.py- at deprecated("The function wishart_logz is deprecated in 0.18 and " sklearn/mixture/dpgmm.py: "will be removed in 0.20.") sklearn/mixture/dpgmm.py=class _DPGMMBase(_GMMBase): sklearn/mixture/dpgmm.py- "instead. DPGMM is deprecated in 0.18 and will be " sklearn/mixture/dpgmm.py: "removed in 0.20.") sklearn/mixture/dpgmm.py=class DPGMM(_DPGMMBase): sklearn/mixture/dpgmm.py- .. deprecated:: 0.18 sklearn/mixture/dpgmm.py: This class will be removed in 0.20. sklearn/mixture/dpgmm.py- "'dirichlet_distribution'` instead. " sklearn/mixture/dpgmm.py: "VBGMM is deprecated in 0.18 and will be removed in 0.20.") sklearn/mixture/dpgmm.py=class VBGMM(_DPGMMBase): sklearn/mixture/dpgmm.py- .. deprecated:: 0.18 sklearn/mixture/dpgmm.py: This class will be removed in 0.20. sklearn/mixture/gmm.py=of Gaussian Mixture Models. sklearn/mixture/gmm.py- sklearn/mixture/gmm.py:# Important note for the deprecation cleaning of 0.20 : sklearn/mixture/gmm.py=EPS = np.finfo(float).eps sklearn/mixture/gmm.py- at deprecated("The function log_multivariate_normal_density is deprecated in 0.18" sklearn/mixture/gmm.py: " and will be removed in 0.20.") sklearn/mixture/gmm.py=def log_multivariate_normal_density(X, means, covars, covariance_type='diag'): sklearn/mixture/gmm.py- at deprecated("The function sample_gaussian is deprecated in 0.18" sklearn/mixture/gmm.py: " and will be removed in 0.20." sklearn/mixture/gmm.py=class _GMMBase(BaseEstimator): sklearn/mixture/gmm.py- at deprecated("The class GMM is deprecated in 0.18 and will be " sklearn/mixture/gmm.py: " removed in 0.20. Use class GaussianMixture instead.") sklearn/mixture/gmm.py=class GMM(_GMMBase): sklearn/mixture/gmm.py- .. deprecated:: 0.18 sklearn/mixture/gmm.py: This class will be removed in 0.20. sklearn/mixture/gmm.py=def _validate_covars(covars, covariance_type, n_components): sklearn/mixture/gmm.py- at deprecated("The functon distribute_covar_matrix_to_match_covariance_type" sklearn/mixture/gmm.py: "is deprecated in 0.18 and will be removed in 0.20.") sklearn/model_selection/_search.py=def _check_param_grid(param_grid): sklearn/model_selection/_search.py- sklearn/model_selection/_search.py:# XXX Remove in 0.20 sklearn/model_selection/_search.py=class BaseSearchCV(six.with_metaclass(ABCMeta, BaseEstimator, sklearn/model_selection/_search.py- " in favor of the more elaborate cv_results_ attribute." sklearn/model_selection/_search.py: " The grid_scores_ attribute will not be available from 0.20", sklearn/tree/_utils.pyx=cdef realloc_ptr safe_realloc(realloc_ptr* p, size_t nelems) except *: sklearn/tree/_utils.pyx- # sizeof(realloc_ptr[0]) would be more like idiomatic C, but causes Cython sklearn/tree/_utils.pyx: # 0.20.1 to crash. sklearn/tree/export.py=def export_graphviz(decision_tree, out_file=SENTINEL, max_depth=None, sklearn/tree/export.py- Handle or name of the output file. If ``None``, the result is sklearn/tree/export.py: returned as a string. This will the default from version 0.20. sklearn/tree/export.py- warnings.warn("out_file can be set to None starting from 0.18. " sklearn/tree/export.py: "This will be the default in 0.20.", sklearn/utils/fast_dict.pyx=cdef class IntFloatDict: sklearn/utils/fast_dict.pyx- sklearn/utils/fast_dict.pyx: # Cython 0.20 generates buggy code below. Commenting this out for now -------------- next part -------------- sklearn/covariance/graph_lasso_.py=class GraphLassoCV(GraphLasso): sklearn/covariance/graph_lasso_.py- @deprecated("Attribute grid_scores was deprecated in version 0.19 and " sklearn/covariance/graph_lasso_.py: "will be removed in 0.21. Use 'grid_scores_' instead") sklearn/datasets/data/boston_house_prices.csv-0.14455,12.5,7.87,0,0.524,6.172,96.1,5.9505,5,311,15.2,396.9,19.15,27.1 sklearn/datasets/data/boston_house_prices.csv-0.04684,0,3.41,0,0.489,6.417,66.1,3.0923,2,270,17.8,392.18,8.81,22.6 sklearn/datasets/data/boston_house_prices.csv-0.38735,0,25.65,0,0.581,5.613,95.6,1.7572,2,188,19.1,359.29,27.26,15.7 sklearn/datasets/data/breast_cancer.csv-15.12,16.68,98.78,716.6,0.08876,0.09588,0.0755,0.04079,0.1594,0.05986,0.2711,0.3621,1.974,26.44,0.005472,0.01919,0.02039,0.00826,0.01523,0.002881,17.77,20.24,117.7,989.5,0.1491,0.3331,0.3327,0.1252,0.3415,0.0974,0 sklearn/datasets/data/breast_cancer.csv-17.93,24.48,115.2,998.9,0.08855,0.07027,0.05699,0.04744,0.1538,0.0551,0.4212,1.433,2.765,45.81,0.005444,0.01169,0.01622,0.008522,0.01419,0.002751,20.92,34.69,135.1,1320,0.1315,0.1806,0.208,0.1136,0.2504,0.07948,0 sklearn/datasets/data/breast_cancer.csv-9,14.4,56.36,246.3,0.07005,0.03116,0.003681,0.003472,0.1788,0.06833,0.1746,1.305,1.144,9.789,0.007389,0.004883,0.003681,0.003472,0.02701,0.002153,9.699,20.07,60.9,285.5,0.09861,0.05232,0.01472,0.01389,0.2991,0.07804,1 sklearn/datasets/data/breast_cancer.csv-12.2,15.21,78.01,457.9,0.08673,0.06545,0.01994,0.01692,0.1638,0.06129,0.2575,0.8073,1.959,19.01,0.005403,0.01418,0.01051,0.005142,0.01333,0.002065,13.75,21.38,91.11,583.1,0.1256,0.1928,0.1167,0.05556,0.2661,0.07961,1 sklearn/decomposition/online_lda.py=class LatentDirichletAllocation(BaseEstimator, TransformerMixin): sklearn/decomposition/online_lda.py- "be ignored as of 0.19. Support for this argument " sklearn/decomposition/online_lda.py: "will be removed in 0.21.", DeprecationWarning) sklearn/decomposition/sparse_pca.py=class SparsePCA(BaseEstimator, TransformerMixin): sklearn/decomposition/sparse_pca.py- .. deprecated:: 0.19 sklearn/decomposition/sparse_pca.py: This parameter will be removed in 0.21. sklearn/decomposition/sparse_pca.py- warnings.warn("The ridge_alpha parameter on transform() is " sklearn/decomposition/sparse_pca.py: "deprecated since 0.19 and will be removed in 0.21. " sklearn/ensemble/gradient_boosting.py=class BaseGradientBoosting(six.with_metaclass(ABCMeta, BaseEnsemble)): sklearn/ensemble/gradient_boosting.py- @deprecated("Attribute n_features was deprecated in version 0.19 and " sklearn/ensemble/gradient_boosting.py: "will be removed in 0.21.") sklearn/gaussian_process/gpr.py=class GaussianProcessRegressor(BaseEstimator, RegressorMixin): sklearn/gaussian_process/gpr.py- @deprecated("Attribute rng was deprecated in version 0.19 and " sklearn/gaussian_process/gpr.py: "will be removed in 0.21.") sklearn/gaussian_process/gpr.py- @deprecated("Attribute y_train_mean was deprecated in version 0.19 and " sklearn/gaussian_process/gpr.py: "will be removed in 0.21.") sklearn/linear_model/stochastic_gradient.py=class BaseSGDClassifier(six.with_metaclass(ABCMeta, BaseSGD, sklearn/linear_model/stochastic_gradient.py- @deprecated("Attribute loss_function was deprecated in version 0.19 and " sklearn/linear_model/stochastic_gradient.py: "will be removed in 0.21. Use 'loss_function_' instead") sklearn/manifold/t_sne.py=class TSNE(BaseEstimator): sklearn/manifold/t_sne.py- @deprecated("Attribute n_iter_final was deprecated in version 0.19 and " sklearn/manifold/t_sne.py: "will be removed in 0.21. Use 'n_iter_' instead") sklearn/utils/validation.py=def check_array(array, accept_sparse=False, dtype="numeric", order=None, sklearn/utils/validation.py- "check_array and check_X_y is deprecated in version 0.19 " sklearn/utils/validation.py: "and will be removed in 0.21. Use 'accept_sparse=False' " From joel.nothman at gmail.com Wed Feb 8 22:39:20 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 9 Feb 2017 14:39:20 +1100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References: <20170109151546.GM2802991@phare.normalesup.org> <20170111215115.GO1585067@phare.normalesup.org> Message-ID: See also http://scikit-learn.org/stable/modules/classes.html#recently-deprecated On 9 February 2017 at 14:30, Joel Nothman wrote: > Not sure that this quite gives you a number, but: > > > $git checkout 0.18.1 > $ git grep -pwB1 0.19 sklearn | grep -ve ^- -e .csv: -e /tests/ > > /tmp/dep19.txt > > etc. > > edited results attached. > > > On 9 February 2017 at 04:15, Andrew Howe wrote: > >> How many current deprecations are expected in the next release? >> >> Andrew >> >> On Jan 12, 2017 00:53, "Gael Varoquaux" >> wrote: >> >> On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote: >> > When the two versions deprecation policy was instituted, releases were >> much >> > more frequent... Is that enough of an excuse? >> >> I'd rather say that we can here decide that we are giving a longer grace >> period. >> >> I think that slow deprecations are a good things (see titus's blog post >> here: http://ivory.idyll.org/blog/2017-pof-software-archivability.html ) >> >> G >> >> > On 12 January 2017 at 03:43, Andreas Mueller wrote: >> >> >> >> > On 01/09/2017 10:15 AM, Gael Varoquaux wrote: >> >> > instead of setting up a roadmap I would rather just >> identify bugs >> > that >> > are blockers and fix only those and don't wait for any >> feature >> > before >> > cutting 0.19.X. >> >> >> >> > I agree with the sentiment, but this would mess with our >> deprecation cycle. >> > If we release now, and then release again soonish, that means >> people have >> > less calendar time >> > to react to deprecations. >> >> > We could either accept this or change all deprecations and bump the >> removal >> > by a version? >> >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> -- >> Gael Varoquaux >> Researcher, INRIA Parietal >> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >> Phone: ++ 33-1-69-08-79-68 >> http://gael-varoquaux.info http://twitter.com/GaelVaroqua >> ux >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmahesh.chandra873 at gmail.com Sat Feb 11 09:18:50 2017 From: mmahesh.chandra873 at gmail.com (Mahesh Chandra) Date: Sat, 11 Feb 2017 15:18:50 +0100 Subject: [scikit-learn] Logistic regression doesnt converge? Message-ID: >reg = 0.1 lr = LogisticRegression(C=1/reg,max_iter=100, fit_intercept=True,solver='lbfgs').fit(X_train, y_train) ytrain_hat = lr.predict_proba(X_train) loss = log_loss(y_train,ytrain_hat) print loss print loss + 0.5*reg*LA.norm(lr.coef_) Maybe i am doing it wrong -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmahesh.chandra873 at gmail.com Sat Feb 11 09:24:09 2017 From: mmahesh.chandra873 at gmail.com (Mahesh Chandra) Date: Sat, 11 Feb 2017 15:24:09 +0100 Subject: [scikit-learn] Logistic regression doesnt converge? In-Reply-To: References: Message-ID: Sorry for incomplete email. Hi, My question was that even after using many solvers, i dont get convergence for Logistic regression. The loss value as calculated in the previous email was less for maxiter=10 than when maxiter = 30. So, does the optimization method diverge and also how do we monitor and store the loss (or any metric) after each iteration? Thanks Mahesh On Sat, Feb 11, 2017 at 3:18 PM, Mahesh Chandra < mmahesh.chandra873 at gmail.com> wrote: > >reg = 0.1 > lr = LogisticRegression(C=1/reg,max_iter=100, fit_intercept=True,solver='lbfgs').fit(X_train, > y_train) > ytrain_hat = lr.predict_proba(X_train) > loss = log_loss(y_train,ytrain_hat) > print loss > print loss + 0.5*reg*LA.norm(lr.coef_) > > Maybe i am doing it wrong > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.merkt at bcf.uni-freiburg.de Mon Feb 13 04:55:55 2017 From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt) Date: Mon, 13 Feb 2017 10:55:55 +0100 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary Message-ID: Hi everyone, I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal using a dictionary learned by a KSVD algorithm (pyksvd). However, during the fit I get the following RuntimeWarning: /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: RuntimeWarning: Orthogonal matching pursuit ended prematurely due to linear dependence in the dictionary. The requested precision might not have been met. copy_X=copy_X, return_path=return_path) In those cases the results are indeed not satisfactory. I don't get the point of this warning as it is common in sparse coding to have an overcomplete dictionary an thus also linear dependency within it. That should not be an issue for OMP. In fact, the warning is also raised if the dictionary is a square matrix. Might this Warning also point to other issues in the application? Thanks, Ben From zephyr14 at gmail.com Mon Feb 13 17:31:35 2017 From: zephyr14 at gmail.com (Vlad Niculae) Date: Tue, 14 Feb 2017 07:31:35 +0900 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: References: Message-ID: Hi, Are the columns of your matrix normalized? Try setting `normalized=True`. Yours, Vlad On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt wrote: > Hi everyone, > > I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal using > a dictionary learned by a KSVD algorithm (pyksvd). However, during the fit I > get the following RuntimeWarning: > > /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: > RuntimeWarning: Orthogonal matching pursuit ended prematurely due to linear > dependence in the dictionary. The requested precision might not have been > met. > > copy_X=copy_X, return_path=return_path) > > In those cases the results are indeed not satisfactory. I don't get the > point of this warning as it is common in sparse coding to have an > overcomplete dictionary an thus also linear dependency within it. That > should not be an issue for OMP. In fact, the warning is also raised if the > dictionary is a square matrix. > > Might this Warning also point to other issues in the application? > > > Thanks, Ben > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From benjamin.merkt at bcf.uni-freiburg.de Tue Feb 14 05:00:52 2017 From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt) Date: Tue, 14 Feb 2017 11:00:52 +0100 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: References: Message-ID: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> Hi, I tried that with no effect. The fit still breaks after two iterations. If I set precompute=True I get three coefficients instead of only two. My Dictionary is fairly large (currently 128x42000). Is it even feasible to use OMP with such a big Matrix (even with ~120GB ram)? -Ben On 13.02.2017 23:31, Vlad Niculae wrote: > Hi, > > Are the columns of your matrix normalized? Try setting `normalized=True`. > > Yours, > Vlad > > On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt > wrote: >> Hi everyone, >> >> I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal using >> a dictionary learned by a KSVD algorithm (pyksvd). However, during the fit I >> get the following RuntimeWarning: >> >> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to linear >> dependence in the dictionary. The requested precision might not have been >> met. >> >> copy_X=copy_X, return_path=return_path) >> >> In those cases the results are indeed not satisfactory. I don't get the >> point of this warning as it is common in sparse coding to have an >> overcomplete dictionary an thus also linear dependency within it. That >> should not be an issue for OMP. In fact, the warning is also raised if the >> dictionary is a square matrix. >> >> Might this Warning also point to other issues in the application? >> >> >> Thanks, Ben >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From pa at letnes.com Tue Feb 14 05:54:27 2017 From: pa at letnes.com (Paul Anton Letnes) Date: Tue, 14 Feb 2017 11:54:27 +0100 Subject: [scikit-learn] cross validation scores seem off for PLSRegression Message-ID: <1487069667072.47907.95300@webmail1> Hi! Versions: sklearn 0.18.1 numpy 1.11.3 Anaconda python 3.5 on ubuntu 16.04 What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below. Cheers Paul In [2]: import numpy as np In [3]: y = np.random.random((10, 3)) In [4]: x = np.random.random((10, 17)) In [5]: from sklearn.cross_decomposition import PLSRegression In [6]: pls = PLSRegression(n_components=3) In [7]: from sklearn.cross_validation import cross_val_score In [8]: from sklearn.model_selection import cross_val_score In [9]: cross_val_score(pls, x, y) Out[9]: array([-32.52217837, -4.17228083, -5.88632365]) PS: This happens even if I cheat by setting y to the predicted value, and cross validate on that. In [29]: y = x @ pls.coef_ In [30]: cross_val_score(pls, x, y) /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5 warnings.warn('Y residual constant at iteration %s' % k) /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6 warnings.warn('Y residual constant at iteration %s' % k) /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6 warnings.warn('Y residual constant at iteration %s' % k) Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ]) In [34]: np.max(np.abs(y - x @ pls.coef_)) Out[34]: 0.0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abdalrahman.eweiwi at gmail.com Tue Feb 14 06:05:52 2017 From: abdalrahman.eweiwi at gmail.com (abdalrahman eweiwi) Date: Tue, 14 Feb 2017 12:05:52 +0100 Subject: [scikit-learn] cross validation scores seem off for PLSRegression In-Reply-To: <1487069667072.47907.95300@webmail1> References: <1487069667072.47907.95300@webmail1> Message-ID: Hi Paul, PLSRegression in sklearn uses an iterative method to estimate the eigen vectors and values (I think it is the power method) , which mostly varies depending on the underlying library that you use, I would suggest to use SVD instead if you want to get stable results and your dataset is small I have wrote also wrote a Kernal PLS which you can find here https://gist.github.com/aeweiwi/7788156 Cheers, On Tue, Feb 14, 2017 at 11:54 AM, Paul Anton Letnes wrote: > Hi! > > Versions: > sklearn 0.18.1 > numpy 1.11.3 > Anaconda python 3.5 on ubuntu 16.04 > > What range is the cross_val_score supposed to be in? I was under the > impression from the documentation, although I cannot find it stated > explicitly anywhere, that it should be a number in the range [0, 1]. > However, it appears that one can get large negative values; see the ipython > session below. > > Cheers > Paul > > In [2]: import numpy as np > > In [3]: y = np.random.random((10, 3)) > > In [4]: x = np.random.random((10, 17)) > > In [5]: from sklearn.cross_decomposition import PLSRegression > > In [6]: pls = PLSRegression(n_components=3) > > In [7]: from sklearn.cross_validation import cross_val_score > > In [8]: from sklearn.model_selection import cross_val_score > > In [9]: cross_val_score(pls, x, y) > Out[9]: array([-32.52217837, -4.17228083, -5.88632365]) > > > PS: > This happens even if I cheat by setting y to the predicted value, and > cross validate on that. > > In [29]: y = x @ pls.coef_ > > In [30]: cross_val_score(pls, x, y) > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site- > packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual > constant at iteration 5 > warnings.warn('Y residual constant at iteration %s' % k) > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site- > packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual > constant at iteration 6 > warnings.warn('Y residual constant at iteration %s' % k) > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site- > packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual > constant at iteration 6 > warnings.warn('Y residual constant at iteration %s' % k) > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ]) > > In [34]: np.max(np.abs(y - x @ pls.coef_)) > Out[34]: 0.0 > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabian.boehnlein at gmail.com Tue Feb 14 06:08:11 2017 From: fabian.boehnlein at gmail.com (=?UTF-8?Q?Fabian_B=C3=B6hnlein?=) Date: Tue, 14 Feb 2017 11:08:11 +0000 Subject: [scikit-learn] cross validation scores seem off for PLSRegression In-Reply-To: <1487069667072.47907.95300@webmail1> References: <1487069667072.47907.95300@webmail1> Message-ID: Hi Paul, not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x). Also see in the documentation why R^2 can be negative: http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cross_decomposition.PLSRegression.score Best, Fabian On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes wrote: > Hi! > > Versions: > sklearn 0.18.1 > numpy 1.11.3 > Anaconda python 3.5 on ubuntu 16.04 > > What range is the cross_val_score supposed to be in? I was under the > impression from the documentation, although I cannot find it stated > explicitly anywhere, that it should be a number in the range [0, 1]. > However, it appears that one can get large negative values; see the ipython > session below. > > Cheers > Paul > > In [2]: import numpy as np > > In [3]: y = np.random.random((10, 3)) > > In [4]: x = np.random.random((10, 17)) > > In [5]: from sklearn.cross_decomposition import PLSRegression > > In [6]: pls = PLSRegression(n_components=3) > > In [7]: from sklearn.cross_validation import cross_val_score > > In [8]: from sklearn.model_selection import cross_val_score > > In [9]: cross_val_score(pls, x, y) > Out[9]: array([-32.52217837, -4.17228083, -5.88632365]) > > > PS: > This happens even if I cheat by setting y to the predicted value, and > cross validate on that. > > In [29]: y = x @ pls.coef_ > > In [30]: cross_val_score(pls, x, y) > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: > UserWarning: Y residual constant at iteration 5 > warnings.warn('Y residual constant at iteration %s' % k) > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: > UserWarning: Y residual constant at iteration 6 > warnings.warn('Y residual constant at iteration %s' % k) > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: > UserWarning: Y residual constant at iteration 6 > warnings.warn('Y residual constant at iteration %s' % k) > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ]) > > In [34]: np.max(np.abs(y - x @ pls.coef_)) > Out[34]: 0.0 > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.merkt at bcf.uni-freiburg.de Tue Feb 14 06:19:42 2017 From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt) Date: Tue, 14 Feb 2017 12:19:42 +0100 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> Message-ID: <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> OK, the issue is resolved. My dictionary was still in 32bit float from saving. When I convert it to 64float before calling fit it works fine. Sorry to bother. On 14.02.2017 11:00, Benjamin Merkt wrote: > Hi, > > I tried that with no effect. The fit still breaks after two iterations. > > If I set precompute=True I get three coefficients instead of only two. > My Dictionary is fairly large (currently 128x42000). Is it even feasible > to use OMP with such a big Matrix (even with ~120GB ram)? > > -Ben > > > > On 13.02.2017 23:31, Vlad Niculae wrote: >> Hi, >> >> Are the columns of your matrix normalized? Try setting `normalized=True`. >> >> Yours, >> Vlad >> >> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >> wrote: >>> Hi everyone, >>> >>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>> signal using >>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>> the fit I >>> get the following RuntimeWarning: >>> >>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to >>> linear >>> dependence in the dictionary. The requested precision might not have >>> been >>> met. >>> >>> copy_X=copy_X, return_path=return_path) >>> >>> In those cases the results are indeed not satisfactory. I don't get the >>> point of this warning as it is common in sparse coding to have an >>> overcomplete dictionary an thus also linear dependency within it. That >>> should not be an issue for OMP. In fact, the warning is also raised >>> if the >>> dictionary is a square matrix. >>> >>> Might this Warning also point to other issues in the application? >>> >>> >>> Thanks, Ben >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From zephyr14 at gmail.com Tue Feb 14 06:26:07 2017 From: zephyr14 at gmail.com (Vlad Niculae) Date: Tue, 14 Feb 2017 20:26:07 +0900 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> Message-ID: Hi Ben, This actually sounds like a bug in this case! At a glance, the code should use the correct BLAS calls for the data type you provide. Can you reproduce this with a simple small example that gets different results if the data is 32 vs 64 bit? Would you mind filing an issue? Thanks, Vlad On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt wrote: > OK, the issue is resolved. My dictionary was still in 32bit float from > saving. When I convert it to 64float before calling fit it works fine. > > Sorry to bother. > > > > On 14.02.2017 11:00, Benjamin Merkt wrote: >> >> Hi, >> >> I tried that with no effect. The fit still breaks after two iterations. >> >> If I set precompute=True I get three coefficients instead of only two. >> My Dictionary is fairly large (currently 128x42000). Is it even feasible >> to use OMP with such a big Matrix (even with ~120GB ram)? >> >> -Ben >> >> >> >> On 13.02.2017 23:31, Vlad Niculae wrote: >>> >>> Hi, >>> >>> Are the columns of your matrix normalized? Try setting `normalized=True`. >>> >>> Yours, >>> Vlad >>> >>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >>> wrote: >>>> >>>> Hi everyone, >>>> >>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>>> signal using >>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>>> the fit I >>>> get the following RuntimeWarning: >>>> >>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to >>>> linear >>>> dependence in the dictionary. The requested precision might not have >>>> been >>>> met. >>>> >>>> copy_X=copy_X, return_path=return_path) >>>> >>>> In those cases the results are indeed not satisfactory. I don't get the >>>> point of this warning as it is common in sparse coding to have an >>>> overcomplete dictionary an thus also linear dependency within it. That >>>> should not be an issue for OMP. In fact, the warning is also raised >>>> if the >>>> dictionary is a square matrix. >>>> >>>> Might this Warning also point to other issues in the application? >>>> >>>> >>>> Thanks, Ben >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From pa at letnes.com Tue Feb 14 06:27:11 2017 From: pa at letnes.com (Paul Anton Letnes) Date: Tue, 14 Feb 2017 12:27:11 +0100 Subject: [scikit-learn] cross validation scores seem off for PLSRegression In-Reply-To: References: <1487069667072.47907.95300@webmail1> Message-ID: <1487071631037.11717.96286@webmail8> @ is a python operator meaning "matrix multiplication". I was deliberately setting y to the prediction to make sure that the PLS model should be able to recreate the values completely and give a sensible score. Paul On 14 February 2017 at 12:08:11 +01:00, Fabian B?hnlein wrote: > Hi Paul, > > not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x). > > Also see in the documentation why R^2 can be negative: > > Best, > Fabian > > > On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <> wrote: > > > Hi! > > > > Versions: > > sklearn 0.18.1 > > numpy 1.11.3 > > Anaconda python 3.5 on ubuntu 16.04 > > > > What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below. > > > > Cheers > > Paul > > > > In [2]: import numpy as np > > > > In [3]: y = np.random.random((10, 3)) > > > > In [4]: x = np.random.random((10, 17)) > > > > In [5]: from sklearn.cross_decomposition import PLSRegression > > > > In [6]: pls = PLSRegression(n_components=3) > > > > In [7]: from sklearn.cross_validation import cross_val_score > > > > In [8]: from sklearn.model_selection import cross_val_score > > > > In [9]: cross_val_score(pls, x, y) > > Out[9]: array([-32.52217837, -4.17228083, -5.88632365]) > > > > > > PS: > > This happens even if I cheat by setting y to the predicted value, and cross validate on that. > > > > In [29]: y = x @ pls.coef_ > > > > In [30]: cross_val_score(pls, x, y) > > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5 > > warnings.warn('Y residual constant at iteration %s' % k) > > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6 > > warnings.warn('Y residual constant at iteration %s' % k) > > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6 > > warnings.warn('Y residual constant at iteration %s' % k) > > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ]) > > > > In [34]: np.max(np.abs(y - x @ pls.coef_)) > > Out[34]: 0.0 > > > > > > _______________________________________________ > > scikit-learn mailing list > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Tue Feb 14 06:28:08 2017 From: zephyr14 at gmail.com (Vlad Niculae) Date: Tue, 14 Feb 2017 20:28:08 +0900 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> Message-ID: One possible issue I can see causing this is if X and y have different dtypes... was this the case for you? On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote: > Hi Ben, > > This actually sounds like a bug in this case! At a glance, the code > should use the correct BLAS calls for the data type you provide. Can > you reproduce this with a simple small example that gets different > results if the data is 32 vs 64 bit? Would you mind filing an issue? > > Thanks, > Vlad > > > On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt > wrote: >> OK, the issue is resolved. My dictionary was still in 32bit float from >> saving. When I convert it to 64float before calling fit it works fine. >> >> Sorry to bother. >> >> >> >> On 14.02.2017 11:00, Benjamin Merkt wrote: >>> >>> Hi, >>> >>> I tried that with no effect. The fit still breaks after two iterations. >>> >>> If I set precompute=True I get three coefficients instead of only two. >>> My Dictionary is fairly large (currently 128x42000). Is it even feasible >>> to use OMP with such a big Matrix (even with ~120GB ram)? >>> >>> -Ben >>> >>> >>> >>> On 13.02.2017 23:31, Vlad Niculae wrote: >>>> >>>> Hi, >>>> >>>> Are the columns of your matrix normalized? Try setting `normalized=True`. >>>> >>>> Yours, >>>> Vlad >>>> >>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >>>> wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>>>> signal using >>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>>>> the fit I >>>>> get the following RuntimeWarning: >>>>> >>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to >>>>> linear >>>>> dependence in the dictionary. The requested precision might not have >>>>> been >>>>> met. >>>>> >>>>> copy_X=copy_X, return_path=return_path) >>>>> >>>>> In those cases the results are indeed not satisfactory. I don't get the >>>>> point of this warning as it is common in sparse coding to have an >>>>> overcomplete dictionary an thus also linear dependency within it. That >>>>> should not be an issue for OMP. In fact, the warning is also raised >>>>> if the >>>>> dictionary is a square matrix. >>>>> >>>>> Might this Warning also point to other issues in the application? >>>>> >>>>> >>>>> Thanks, Ben >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn From emanuela.boros at gmail.com Tue Feb 14 06:52:48 2017 From: emanuela.boros at gmail.com (Emanuela Boros) Date: Tue, 14 Feb 2017 12:52:48 +0100 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: References: Message-ID: Just as a side point - which will not contribute to the purpose of this discussion - you can use pyksvd for sparse coding also. Emanuela Boros LIMSI-CNRS CDS/LAL-CNRS Orsay, France personal: 06 52 17 4595 work: 01 64 46 8954 emanuela.boros@{u-psud.fr,gmail.com} boros@{limsi.fr,lal.in2p3.fr} On Mon, Feb 13, 2017 at 10:55 AM, Benjamin Merkt < benjamin.merkt at bcf.uni-freiburg.de> wrote: > Hi everyone, > > I'm using OrthogonalMatchingPursuit to get a sparse coding of a signal > using a dictionary learned by a KSVD algorithm (pyksvd). However, during > the fit I get the following RuntimeWarning: > > /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: > RuntimeWarning: Orthogonal matching pursuit ended prematurely due to > linear dependence in the dictionary. The requested precision might not have > been met. > > copy_X=copy_X, return_path=return_path) > > In those cases the results are indeed not satisfactory. I don't get the > point of this warning as it is common in sparse coding to have an > overcomplete dictionary an thus also linear dependency within it. That > should not be an issue for OMP. In fact, the warning is also raised if the > dictionary is a square matrix. > > Might this Warning also point to other issues in the application? > > > Thanks, Ben > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pa at letnes.com Tue Feb 14 06:58:19 2017 From: pa at letnes.com (Paul Anton Letnes) Date: Tue, 14 Feb 2017 12:58:19 +0100 Subject: [scikit-learn] cross validation scores seem off for PLSRegression In-Reply-To: References: <1487069667072.47907.95300@webmail1> Message-ID: <1487073499094.130285.96242@webmail5> Oh, and thanks for pointing out the bit about R^2 being negative - although it "feels off" in my head! Complex R? ----------- Paul Anton On 14 February 2017 at 12:08:11 +01:00, Fabian B?hnlein wrote: > Hi Paul, > > not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x). > > Also see in the documentation why R^2 can be negative: > > Best, > Fabian > > > On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <> wrote: > > > Hi! > > > > Versions: > > sklearn 0.18.1 > > numpy 1.11.3 > > Anaconda python 3.5 on ubuntu 16.04 > > > > What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below. > > > > Cheers > > Paul > > > > In [2]: import numpy as np > > > > In [3]: y = np.random.random((10, 3)) > > > > In [4]: x = np.random.random((10, 17)) > > > > In [5]: from sklearn.cross_decomposition import PLSRegression > > > > In [6]: pls = PLSRegression(n_components=3) > > > > In [7]: from sklearn.cross_validation import cross_val_score > > > > In [8]: from sklearn.model_selection import cross_val_score > > > > In [9]: cross_val_score(pls, x, y) > > Out[9]: array([-32.52217837, -4.17228083, -5.88632365]) > > > > > > PS: > > This happens even if I cheat by setting y to the predicted value, and cross validate on that. > > > > In [29]: y = x @ pls.coef_ > > > > In [30]: cross_val_score(pls, x, y) > > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5 > > warnings.warn('Y residual constant at iteration %s' % k) > > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6 > > warnings.warn('Y residual constant at iteration %s' % k) > > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6 > > warnings.warn('Y residual constant at iteration %s' % k) > > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ]) > > > > In [34]: np.max(np.abs(y - x @ pls.coef_)) > > Out[34]: 0.0 > > > > > > _______________________________________________ > > scikit-learn mailing list > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bertrand.thirion at inria.fr Tue Feb 14 07:04:34 2017 From: bertrand.thirion at inria.fr (Bertrand Thirion) Date: Tue, 14 Feb 2017 13:04:34 +0100 (CET) Subject: [scikit-learn] cross validation scores seem off for PLSRegression In-Reply-To: <1487073499094.130285.96242@webmail5> References: <1487069667072.47907.95300@webmail1> <1487073499094.130285.96242@webmail5> Message-ID: <1841132902.24047782.1487073874871.JavaMail.zimbra@inria.fr> https://en.wikipedia.org/wiki/Coefficient_of_determination "Important cases where the computational definition of R 2 can yield negative values, depending on the definition used, arise where the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data, and where linear regression is conducted without including an intercept." Best, Bertrand ----- Mail original ----- > De: "Paul Anton Letnes" > ?: "Fabian B?hnlein" > Cc: "Scikit-learn user and developer mailing list" > Envoy?: Mardi 14 F?vrier 2017 12:58:19 > Objet: Re: [scikit-learn] cross validation scores seem off for PLSRegression > Oh, and thanks for pointing out the bit about R^2 being negative - although > it "feels off" in my head! Complex R? > ----------- > Paul Anton -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.merkt at bcf.uni-freiburg.de Tue Feb 14 07:34:51 2017 From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt) Date: Tue, 14 Feb 2017 13:34:51 +0100 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> Message-ID: <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de> Yes, the data array y was already float64. On 14.02.2017 12:28, Vlad Niculae wrote: > One possible issue I can see causing this is if X and y have different > dtypes... was this the case for you? > > On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote: >> Hi Ben, >> >> This actually sounds like a bug in this case! At a glance, the code >> should use the correct BLAS calls for the data type you provide. Can >> you reproduce this with a simple small example that gets different >> results if the data is 32 vs 64 bit? Would you mind filing an issue? >> >> Thanks, >> Vlad >> >> >> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt >> wrote: >>> OK, the issue is resolved. My dictionary was still in 32bit float from >>> saving. When I convert it to 64float before calling fit it works fine. >>> >>> Sorry to bother. >>> >>> >>> >>> On 14.02.2017 11:00, Benjamin Merkt wrote: >>>> >>>> Hi, >>>> >>>> I tried that with no effect. The fit still breaks after two iterations. >>>> >>>> If I set precompute=True I get three coefficients instead of only two. >>>> My Dictionary is fairly large (currently 128x42000). Is it even feasible >>>> to use OMP with such a big Matrix (even with ~120GB ram)? >>>> >>>> -Ben >>>> >>>> >>>> >>>> On 13.02.2017 23:31, Vlad Niculae wrote: >>>>> >>>>> Hi, >>>>> >>>>> Are the columns of your matrix normalized? Try setting `normalized=True`. >>>>> >>>>> Yours, >>>>> Vlad >>>>> >>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >>>>> wrote: >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>>>>> signal using >>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>>>>> the fit I >>>>>> get the following RuntimeWarning: >>>>>> >>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely due to >>>>>> linear >>>>>> dependence in the dictionary. The requested precision might not have >>>>>> been >>>>>> met. >>>>>> >>>>>> copy_X=copy_X, return_path=return_path) >>>>>> >>>>>> In those cases the results are indeed not satisfactory. I don't get the >>>>>> point of this warning as it is common in sparse coding to have an >>>>>> overcomplete dictionary an thus also linear dependency within it. That >>>>>> should not be an issue for OMP. In fact, the warning is also raised >>>>>> if the >>>>>> dictionary is a square matrix. >>>>>> >>>>>> Might this Warning also point to other issues in the application? >>>>>> >>>>>> >>>>>> Thanks, Ben >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From pa at letnes.com Tue Feb 14 07:53:31 2017 From: pa at letnes.com (Paul Anton Letnes) Date: Tue, 14 Feb 2017 13:53:31 +0100 Subject: [scikit-learn] PLSRegression cross validates poorly when scaling Message-ID: <1487076811458.24654.96908@webmail3> Hi! I've noticed that PLSRegression seems to cross validate incredibly poorly when scale=True. Could there be a bug here, or is there something I'm not getting this time, too? I noticed the very small (i.e. large negative) cross validation scores on a dataset that was far from unit variance; there, too, cross validation was extremely poor: around 0.4 in score when scaling was disabled, but (for example) -54422617.41005663 when scaling was enabled! In [1]: import numpy as np In [2]: from sklearn import cross_decomposition In [3]: x = np.random.random((10,17)) In [4]: y = np.random.random((10, 3)) In [5]: pls = cross_decomposition.PLSRegression(scale=True) In [6]: pls.fit(x,y) Out[6]: PLSRegression(copy=True, max_iter=500, n_components=2, scale=True, tol=1e-06) In [7]: from sklearn import model_selection In [8]: model_selection.cross_val_score(pls, x, y) Out[8]: array([-10.1680294 , -12.94229352, -13.39506559]) In [9]: pls = cross_decomposition.PLSRegression(scale=False) In [10]: model_selection.cross_val_score(pls, x, y) Out[10]: array([-0.5904095 , -1.16551493, -1.71555855]) Cheers Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From soumyodey at live.com Wed Feb 15 00:22:54 2017 From: soumyodey at live.com (Soumyo Dey) Date: Wed, 15 Feb 2017 05:22:54 +0000 Subject: [scikit-learn] Need help to start contributing Message-ID: Hello, I want to start contributing to the project, help me get started with an easyfix. I was able to setup git repository. Now I would like to start contributing with some code. Thank you, Soumyo Dey Twitter : @SoumyoDey Website: http://ace139.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.duprelatour at orange.fr Wed Feb 15 09:47:30 2017 From: tom.duprelatour at orange.fr (Tom DLT) Date: Wed, 15 Feb 2017 15:47:30 +0100 Subject: [scikit-learn] Need help to start contributing In-Reply-To: References: Message-ID: Welcome! If you're looking to get started, you might try sorting issues by those with "Needs contributor" and "easy" to begin with. https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3AEasy+label%3A%22Need+Contributor%22 You should also check out the contributor guidelines: http://scikit-learn.org/dev/developers/index.html We look forward to seeing your contributions. Tom 2017-02-15 6:22 GMT+01:00 Soumyo Dey : > Hello, > > > I want to start contributing to the project, help me get started with an > easyfix. I was able to setup git repository. Now I would like to start > contributing with some code. > > > Thank you, > > Soumyo Dey > > Twitter : @SoumyoDey > > Website: http://ace139.com/ > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Afarin.Famili at UTSouthwestern.edu Wed Feb 15 19:40:19 2017 From: Afarin.Famili at UTSouthwestern.edu (Afarin Famili) Date: Thu, 16 Feb 2017 00:40:19 +0000 Subject: [scikit-learn] A quick question regarding permutation_test_score Message-ID: <1487205619542.10167@UTSouthwestern.edu> Hi folks, I have a question regarding how to use permutation_test_Score. Given data X (predictor) and Y (target), I hold aside 20% of my data for testing (Xtest and Ytest) and would then Perform hyperparameter-tuning on the rest (using Xtrain and Ytrain). This way I can get the best parameters via RandomizedSearchCV. I now want to call permutation_test_score to compute the score, as well as the p-value of the model prediction. But the question is what X and Y should I send as input arguments to this function? I could send in X and Y but then my hyperparameter parameters were already tuned to Xtrain and Ytrain, which are a part of X and Y and that would bias the output values. Any help would be greatly appreciated. Thanks, Afarin ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: From soumyodey at live.com Thu Feb 16 14:15:25 2017 From: soumyodey at live.com (Soumyo Dey) Date: Thu, 16 Feb 2017 19:15:25 +0000 Subject: [scikit-learn] Need help to start contributing In-Reply-To: References: , Message-ID: Hello, Thank you Tom for the welcome. I would like to know, is it okay to work on the same bug which some other is already working on, or does the core devs/ mentors assign bugs to individuals? Thank you, Soumyo Dey Twitter : @SoumyoDey Website: http://ace139.com/ ________________________________ From: scikit-learn on behalf of Tom DLT Sent: Wednesday, February 15, 2017 8:17:30 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Need help to start contributing Welcome! If you're looking to get started, you might try sorting issues by those with "Needs contributor" and "easy" to begin with. https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3AEasy+label%3A%22Need+Contributor%22 You should also check out the contributor guidelines: http://scikit-learn.org/dev/developers/index.html We look forward to seeing your contributions. Tom 2017-02-15 6:22 GMT+01:00 Soumyo Dey >: Hello, I want to start contributing to the project, help me get started with an easyfix. I was able to setup git repository. Now I would like to start contributing with some code. Thank you, Soumyo Dey Twitter : @SoumyoDey Website: http://ace139.com/ _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Feb 16 15:58:01 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 16 Feb 2017 21:58:01 +0100 Subject: [scikit-learn] Need help to start contributing In-Reply-To: References: Message-ID: It's ok to work on a bug if the original contributor has not replied to the reviewers comments in a while (e.g. a couple of weeks). -- Olivier From benjamin.merkt at bcf.uni-freiburg.de Thu Feb 16 17:25:37 2017 From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt) Date: Thu, 16 Feb 2017 23:25:37 +0100 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de> References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de> Message-ID: <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de> Is this still considered a bug and therefore worth an issue? On 14.02.2017 13:34, Benjamin Merkt wrote: > Yes, the data array y was already float64. > > > On 14.02.2017 12:28, Vlad Niculae wrote: >> One possible issue I can see causing this is if X and y have different >> dtypes... was this the case for you? >> >> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote: >>> Hi Ben, >>> >>> This actually sounds like a bug in this case! At a glance, the code >>> should use the correct BLAS calls for the data type you provide. Can >>> you reproduce this with a simple small example that gets different >>> results if the data is 32 vs 64 bit? Would you mind filing an issue? >>> >>> Thanks, >>> Vlad >>> >>> >>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt >>> wrote: >>>> OK, the issue is resolved. My dictionary was still in 32bit float from >>>> saving. When I convert it to 64float before calling fit it works fine. >>>> >>>> Sorry to bother. >>>> >>>> >>>> >>>> On 14.02.2017 11:00, Benjamin Merkt wrote: >>>>> >>>>> Hi, >>>>> >>>>> I tried that with no effect. The fit still breaks after two >>>>> iterations. >>>>> >>>>> If I set precompute=True I get three coefficients instead of only two. >>>>> My Dictionary is fairly large (currently 128x42000). Is it even >>>>> feasible >>>>> to use OMP with such a big Matrix (even with ~120GB ram)? >>>>> >>>>> -Ben >>>>> >>>>> >>>>> >>>>> On 13.02.2017 23:31, Vlad Niculae wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Are the columns of your matrix normalized? Try setting >>>>>> `normalized=True`. >>>>>> >>>>>> Yours, >>>>>> Vlad >>>>>> >>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >>>>>> wrote: >>>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>>>>>> signal using >>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>>>>>> the fit I >>>>>>> get the following RuntimeWarning: >>>>>>> >>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>>>>>> >>>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely >>>>>>> due to >>>>>>> linear >>>>>>> dependence in the dictionary. The requested precision might not have >>>>>>> been >>>>>>> met. >>>>>>> >>>>>>> copy_X=copy_X, return_path=return_path) >>>>>>> >>>>>>> In those cases the results are indeed not satisfactory. I don't >>>>>>> get the >>>>>>> point of this warning as it is common in sparse coding to have an >>>>>>> overcomplete dictionary an thus also linear dependency within it. >>>>>>> That >>>>>>> should not be an issue for OMP. In fact, the warning is also raised >>>>>>> if the >>>>>>> dictionary is a square matrix. >>>>>>> >>>>>>> Might this Warning also point to other issues in the application? >>>>>>> >>>>>>> >>>>>>> Thanks, Ben >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From nelle.varoquaux at gmail.com Thu Feb 16 17:40:45 2017 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Thu, 16 Feb 2017 14:40:45 -0800 Subject: [scikit-learn] Announcing: Docathon, week of 6 March 2017 Message-ID: Hi everyone, I don't really think scikit-learn's documentation is lacking, but here is an announcement for an event we are organizing called the "Docathon". Several of us will be meeting up to sprint on documentation or documentation-related projects at Berkeley, New York and Seattle. If you are interested in joining us, either remotely or on campus, don't hesitate to join! Cheers, Nelle *What's a Docathon?* It's a week-long sprint where we focus our efforts on improving the state of documentation in the open-source and open-science world. This means writing better documentation, building tools, and sharing skills. *Who?s this for?* Anyone who is interested in improving the understandability, accessibility, and clarity of software! This might mean developers with a particular project, or individuals who would like to contribute to a project. You don?t need to use a specific language (though there will be many Python and R developers) and you don?t need to be a core developer in order to help out. *Where can I sign up?* Check out the *Docathon website* . You can sign up as a *participant* , *suggest a project* to work on, or sign up *to host your own* remote Docathon wherever you like. You don?t have to use a specific language - we?ll be as accommodating as possible! *When is the Docathon?* The Docathon will be held *March 6 through March 10*. For those coming to BIDS at UC Berkeley, on the first day we'll have tutorials about documentation and demos of documentation tools, followed by a few hours of hacking. During the middle of the week, we'll set aside a few hours each afternoon for hacking as a group at BIDS. On the last day, we'll have a wrap-up event to show off what everybody worked on. *Where will the Docathon take place?* There are a *few docathons being held simultaneously* , each with their own schedule. At Berkeley we'll have a physical presence at BIDS over the week, and we encourage you to show up for the hours we set aside for doc hacking. However, it is totally fine to work remotely; we will coordinate people via email/GitHub, too. *Where can I get more information?* Check out an updated schedule, list of tutorials, and more information at our website here: *bids.github.io/docathon* . *Contact* If you have any questions, open an issue on our *GitHub repo* . We look forward to hearing from you! Please feel free to forward this email to anyone who may be interested. We'd love for other institutions/groups to get involved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Thu Feb 16 19:56:54 2017 From: zephyr14 at gmail.com (Vlad Niculae) Date: Fri, 17 Feb 2017 09:56:54 +0900 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de> References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de> <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de> Message-ID: I would consider this a bug. I'm not 100% sure what the conventions for dtypes are. I'd appreciate it if you could open an issue, and even better if you have a small reproducing example. I'll look into it this weekend. Vlad On Fri, Feb 17, 2017 at 7:25 AM, Benjamin Merkt wrote: > Is this still considered a bug and therefore worth an issue? > > > On 14.02.2017 13:34, Benjamin Merkt wrote: >> >> Yes, the data array y was already float64. >> >> >> On 14.02.2017 12:28, Vlad Niculae wrote: >>> >>> One possible issue I can see causing this is if X and y have different >>> dtypes... was this the case for you? >>> >>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote: >>>> >>>> Hi Ben, >>>> >>>> This actually sounds like a bug in this case! At a glance, the code >>>> should use the correct BLAS calls for the data type you provide. Can >>>> you reproduce this with a simple small example that gets different >>>> results if the data is 32 vs 64 bit? Would you mind filing an issue? >>>> >>>> Thanks, >>>> Vlad >>>> >>>> >>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt >>>> wrote: >>>>> >>>>> OK, the issue is resolved. My dictionary was still in 32bit float from >>>>> saving. When I convert it to 64float before calling fit it works fine. >>>>> >>>>> Sorry to bother. >>>>> >>>>> >>>>> >>>>> On 14.02.2017 11:00, Benjamin Merkt wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I tried that with no effect. The fit still breaks after two >>>>>> iterations. >>>>>> >>>>>> If I set precompute=True I get three coefficients instead of only two. >>>>>> My Dictionary is fairly large (currently 128x42000). Is it even >>>>>> feasible >>>>>> to use OMP with such a big Matrix (even with ~120GB ram)? >>>>>> >>>>>> -Ben >>>>>> >>>>>> >>>>>> >>>>>> On 13.02.2017 23:31, Vlad Niculae wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Are the columns of your matrix normalized? Try setting >>>>>>> `normalized=True`. >>>>>>> >>>>>>> Yours, >>>>>>> Vlad >>>>>>> >>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>>>>>>> signal using >>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>>>>>>> the fit I >>>>>>>> get the following RuntimeWarning: >>>>>>>> >>>>>>>> >>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>>>>>>> >>>>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely >>>>>>>> due to >>>>>>>> linear >>>>>>>> dependence in the dictionary. The requested precision might not have >>>>>>>> been >>>>>>>> met. >>>>>>>> >>>>>>>> copy_X=copy_X, return_path=return_path) >>>>>>>> >>>>>>>> In those cases the results are indeed not satisfactory. I don't >>>>>>>> get the >>>>>>>> point of this warning as it is common in sparse coding to have an >>>>>>>> overcomplete dictionary an thus also linear dependency within it. >>>>>>>> That >>>>>>>> should not be an issue for OMP. In fact, the warning is also raised >>>>>>>> if the >>>>>>>> dictionary is a square matrix. >>>>>>>> >>>>>>>> Might this Warning also point to other issues in the application? >>>>>>>> >>>>>>>> >>>>>>>> Thanks, Ben >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> scikit-learn mailing list >>>>>>>> scikit-learn at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From benjamin.merkt at bcf.uni-freiburg.de Fri Feb 17 05:53:15 2017 From: benjamin.merkt at bcf.uni-freiburg.de (Benjamin Merkt) Date: Fri, 17 Feb 2017 11:53:15 +0100 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de> <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de> Message-ID: While trying to get a minimal example to reproduce the error I found that there it also occurred when both arrays where float64. However, I then realized that my data vector has fairly small values (~1e-4 to 1e-8). If I normalize this as well it works for all combinations of 64 and 32 bit. -Ben On 17.02.2017 01:56, Vlad Niculae wrote: > I would consider this a bug. I'm not 100% sure what the conventions > for dtypes are. I'd appreciate it if you could open an issue, and even > better if you have a small reproducing example. I'll look into it this > weekend. > > Vlad > > On Fri, Feb 17, 2017 at 7:25 AM, Benjamin Merkt > wrote: >> Is this still considered a bug and therefore worth an issue? >> >> >> On 14.02.2017 13:34, Benjamin Merkt wrote: >>> >>> Yes, the data array y was already float64. >>> >>> >>> On 14.02.2017 12:28, Vlad Niculae wrote: >>>> >>>> One possible issue I can see causing this is if X and y have different >>>> dtypes... was this the case for you? >>>> >>>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae wrote: >>>>> >>>>> Hi Ben, >>>>> >>>>> This actually sounds like a bug in this case! At a glance, the code >>>>> should use the correct BLAS calls for the data type you provide. Can >>>>> you reproduce this with a simple small example that gets different >>>>> results if the data is 32 vs 64 bit? Would you mind filing an issue? >>>>> >>>>> Thanks, >>>>> Vlad >>>>> >>>>> >>>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt >>>>> wrote: >>>>>> >>>>>> OK, the issue is resolved. My dictionary was still in 32bit float from >>>>>> saving. When I convert it to 64float before calling fit it works fine. >>>>>> >>>>>> Sorry to bother. >>>>>> >>>>>> >>>>>> >>>>>> On 14.02.2017 11:00, Benjamin Merkt wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I tried that with no effect. The fit still breaks after two >>>>>>> iterations. >>>>>>> >>>>>>> If I set precompute=True I get three coefficients instead of only two. >>>>>>> My Dictionary is fairly large (currently 128x42000). Is it even >>>>>>> feasible >>>>>>> to use OMP with such a big Matrix (even with ~120GB ram)? >>>>>>> >>>>>>> -Ben >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 13.02.2017 23:31, Vlad Niculae wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Are the columns of your matrix normalized? Try setting >>>>>>>> `normalized=True`. >>>>>>>> >>>>>>>> Yours, >>>>>>>> Vlad >>>>>>>> >>>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>>>>>>>> signal using >>>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>>>>>>>> the fit I >>>>>>>>> get the following RuntimeWarning: >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>>>>>>>> >>>>>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely >>>>>>>>> due to >>>>>>>>> linear >>>>>>>>> dependence in the dictionary. The requested precision might not have >>>>>>>>> been >>>>>>>>> met. >>>>>>>>> >>>>>>>>> copy_X=copy_X, return_path=return_path) >>>>>>>>> >>>>>>>>> In those cases the results are indeed not satisfactory. I don't >>>>>>>>> get the >>>>>>>>> point of this warning as it is common in sparse coding to have an >>>>>>>>> overcomplete dictionary an thus also linear dependency within it. >>>>>>>>> That >>>>>>>>> should not be an issue for OMP. In fact, the warning is also raised >>>>>>>>> if the >>>>>>>>> dictionary is a square matrix. >>>>>>>>> >>>>>>>>> Might this Warning also point to other issues in the application? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, Ben >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> scikit-learn mailing list >>>>>>>>> scikit-learn at python.org >>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> scikit-learn mailing list >>>>>>>> scikit-learn at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From ian at ianozsvald.com Fri Feb 17 09:41:18 2017 From: ian at ianozsvald.com (Ian Ozsvald) Date: Fri, 17 Feb 2017 14:41:18 +0000 Subject: [scikit-learn] ANN: PyDataLondon Conference in May - Call for Proposals closing in 1 week Message-ID: PyDataLondon 2017 runs in London this May 5-7th at Bloomberg's HQ near London Bridge. Our Call for Proposals is open until February 24th (next Friday), and I'd love to see sklearn talks and tutorial proposals: http://pydata.org/london2017/ This is our 4th annual conference, we'll have 330 active data scientists over the 3 days. Our conference builds on our 4,800+ member meetup which runs every month at hedge fund AHL: http://london.pydata.org/ I'd *love* to see a general sklearn tutorial at the conference, there's a real demand for this here in London. I'm also very interested in communicating complex data visually, applications of data science that "made a difference", data engineering and all the topics you'd expect at a strong data science conference. See last year's schedule if you'd like an idea of what to expect: http://pydata.org/london2016/schedule/ You may also be interested in PyDataAmsterdam (April 8-9th) and PyDataBerlin (June 30th- July 2nd), both have their CfP open at the moment: http://pydata.org/amsterdam2017/ http://pydata.org/berlin2017/ I'm hoping to see some interesting sklearn submissions, Ian (conference co-chair) ps. At our monthly meetups I'm also asking members to think on testimonials they could provide back to the sklearn testimonials page, I think that'll be a slow mission but I'll keep pushing the message. Hopefully a few companies will reciprocate to help with your grant applications -- Ian Ozsvald (Data Scientist, PyDataLondon co-chair) ian at IanOzsvald.com http://IanOzsvald.com http://ModelInsight.io http://twitter.com/IanOzsvald From akshay0724 at gmail.com Fri Feb 17 13:04:29 2017 From: akshay0724 at gmail.com (Akshay Gupta) Date: Fri, 17 Feb 2017 23:34:29 +0530 Subject: [scikit-learn] Google Summer of code 2017 Message-ID: Are we having any plans to take part in GSOC this year? If so then I would like to apply this year. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff1evesque at yahoo.com Fri Feb 17 13:12:52 2017 From: jeff1evesque at yahoo.com (Jeffrey Levesque) Date: Fri, 17 Feb 2017 13:12:52 -0500 Subject: [scikit-learn] Google Summer of code 2017 In-Reply-To: References: Message-ID: My project has applied for the Google Summer of Code 2017: - https://github.com/jeff1evesque/machine-learning The project is intended to be an interface to the scikit-learn utilities. This means a visualization HTML interface, as well as a programmatic interface (send post requests to the server). If anyone is interested in helping, let me know. Thank you, Jeff Levesque https://github.com/jeff1evesque > On Feb 17, 2017, at 1:04 PM, Akshay Gupta wrote: > > Are we having any plans to take part in GSOC this year? > > If so then I would like to apply this year. > Thanks > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From max.linke88 at gmail.com Fri Feb 17 13:21:46 2017 From: max.linke88 at gmail.com (Max Linke) Date: Fri, 17 Feb 2017 19:21:46 +0100 Subject: [scikit-learn] Google Summer of code 2017 In-Reply-To: References: Message-ID: <2decef69-b2a3-c0a8-d2dc-53adeeae78c7@gmail.com> You should check GSoC-general at python.org. There have been questions about scikit-learn participation in GSoC this year. best Max On 02/17/2017 07:12 PM, Jeffrey Levesque via scikit-learn wrote: > My project has applied for the Google Summer of Code 2017: > > - https://github.com/jeff1evesque/machine-learning > > The project is intended to be an interface to the scikit-learn utilities. This means a visualization HTML interface, as well as a programmatic interface (send post requests to the server). If anyone is interested in helping, let me know. > > > Thank you, > > Jeff Levesque > https://github.com/jeff1evesque > >> On Feb 17, 2017, at 1:04 PM, Akshay Gupta wrote: >> >> Are we having any plans to take part in GSOC this year? >> >> If so then I would like to apply this year. >> Thanks >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From stuart at stuartreynolds.net Fri Feb 17 14:06:39 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Fri, 17 Feb 2017 11:06:39 -0800 Subject: [scikit-learn] Modelling event rates Message-ID: Does scikit provide any event-rate/time-to-event models, or other models that are specifically time-dependent? (e.g. models that output the # events per unit of time) Examples might include: Poisson model, or Cox proportional hazard. There was some discussion about pulling from statsmodels, https://github.com/scikit-learn/scikit-learn/issues/5975 but (AFAIK), this was not done. -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri Feb 17 14:18:05 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 17 Feb 2017 20:18:05 +0100 Subject: [scikit-learn] Modelling event rates In-Reply-To: References: Message-ID: I don't think we have any model dedicated to this, but it's possible that expressive non-parametricmodels such as RF and GBRT or richly parameterized models such as MLP with a regression loss can do a good enough job at giving you a point estimate. -- Olivier From zephyr14 at gmail.com Fri Feb 17 20:01:32 2017 From: zephyr14 at gmail.com (Vlad Niculae) Date: Sat, 18 Feb 2017 10:01:32 +0900 Subject: [scikit-learn] OMP ended prematurely due to linear dependence in the dictionary In-Reply-To: References: <80881741-0259-dbe2-0a63-f5125dd78671@bcf.uni-freiburg.de> <7255cf2b-12da-8c3c-63ca-2189b4fd0a67@bcf.uni-freiburg.de> <66717e36-5cc7-2ad4-a601-17efb75d7fc5@bcf.uni-freiburg.de> <426e4241-4247-7a73-1527-34d68097f92f@bcf.uni-freiburg.de> Message-ID: Oh I'm inclined to say this isn't a bug then. Your residuals can simply be low enough to trigger early stopping this way. Although I agree the warning could be improved. However, if it IS the case that plugging in 32bit X and 64bit y leads to *different results* than if both have the same dtype (all other things being equal) than that would be a bug. (even if the different results don't consist in an unwanted early stopping.) Is this the case? On Fri, Feb 17, 2017 at 7:53 PM, Benjamin Merkt wrote: > While trying to get a minimal example to reproduce the error I found that > there it also occurred when both arrays where float64. However, I then > realized that my data vector has fairly small values (~1e-4 to 1e-8). If I > normalize this as well it works for all combinations of 64 and 32 bit. > > -Ben > > > > On 17.02.2017 01:56, Vlad Niculae wrote: >> >> I would consider this a bug. I'm not 100% sure what the conventions >> for dtypes are. I'd appreciate it if you could open an issue, and even >> better if you have a small reproducing example. I'll look into it this >> weekend. >> >> Vlad >> >> On Fri, Feb 17, 2017 at 7:25 AM, Benjamin Merkt >> wrote: >>> >>> Is this still considered a bug and therefore worth an issue? >>> >>> >>> On 14.02.2017 13:34, Benjamin Merkt wrote: >>>> >>>> >>>> Yes, the data array y was already float64. >>>> >>>> >>>> On 14.02.2017 12:28, Vlad Niculae wrote: >>>>> >>>>> >>>>> One possible issue I can see causing this is if X and y have different >>>>> dtypes... was this the case for you? >>>>> >>>>> On Tue, Feb 14, 2017 at 8:26 PM, Vlad Niculae >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Ben, >>>>>> >>>>>> This actually sounds like a bug in this case! At a glance, the code >>>>>> should use the correct BLAS calls for the data type you provide. Can >>>>>> you reproduce this with a simple small example that gets different >>>>>> results if the data is 32 vs 64 bit? Would you mind filing an issue? >>>>>> >>>>>> Thanks, >>>>>> Vlad >>>>>> >>>>>> >>>>>> On Tue, Feb 14, 2017 at 8:19 PM, Benjamin Merkt >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> OK, the issue is resolved. My dictionary was still in 32bit float >>>>>>> from >>>>>>> saving. When I convert it to 64float before calling fit it works >>>>>>> fine. >>>>>>> >>>>>>> Sorry to bother. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 14.02.2017 11:00, Benjamin Merkt wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I tried that with no effect. The fit still breaks after two >>>>>>>> iterations. >>>>>>>> >>>>>>>> If I set precompute=True I get three coefficients instead of only >>>>>>>> two. >>>>>>>> My Dictionary is fairly large (currently 128x42000). Is it even >>>>>>>> feasible >>>>>>>> to use OMP with such a big Matrix (even with ~120GB ram)? >>>>>>>> >>>>>>>> -Ben >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 13.02.2017 23:31, Vlad Niculae wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Are the columns of your matrix normalized? Try setting >>>>>>>>> `normalized=True`. >>>>>>>>> >>>>>>>>> Yours, >>>>>>>>> Vlad >>>>>>>>> >>>>>>>>> On Mon, Feb 13, 2017 at 6:55 PM, Benjamin Merkt >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> I'm using OrthogonalMatchingPursuit to get a sparse coding of a >>>>>>>>>> signal using >>>>>>>>>> a dictionary learned by a KSVD algorithm (pyksvd). However, during >>>>>>>>>> the fit I >>>>>>>>>> get the following RuntimeWarning: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: >>>>>>>>>> >>>>>>>>>> RuntimeWarning: Orthogonal matching pursuit ended prematurely >>>>>>>>>> due to >>>>>>>>>> linear >>>>>>>>>> dependence in the dictionary. The requested precision might not >>>>>>>>>> have >>>>>>>>>> been >>>>>>>>>> met. >>>>>>>>>> >>>>>>>>>> copy_X=copy_X, return_path=return_path) >>>>>>>>>> >>>>>>>>>> In those cases the results are indeed not satisfactory. I don't >>>>>>>>>> get the >>>>>>>>>> point of this warning as it is common in sparse coding to have an >>>>>>>>>> overcomplete dictionary an thus also linear dependency within it. >>>>>>>>>> That >>>>>>>>>> should not be an issue for OMP. In fact, the warning is also >>>>>>>>>> raised >>>>>>>>>> if the >>>>>>>>>> dictionary is a square matrix. >>>>>>>>>> >>>>>>>>>> Might this Warning also point to other issues in the application? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, Ben >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> scikit-learn mailing list >>>>>>>>>> scikit-learn at python.org >>>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> scikit-learn mailing list >>>>>>>>> scikit-learn at python.org >>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> scikit-learn mailing list >>>>>>>> scikit-learn at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Sat Feb 18 13:15:23 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Sat, 18 Feb 2017 13:15:23 -0500 Subject: [scikit-learn] GSOC call for mentors In-Reply-To: References: Message-ID: So have we made a decision not to participate? I'm totally fine with that, but we should make it a conscious decision and not just wait until the deadline approaches and then hack something together last minute. On 01/31/2017 12:55 PM, Guillaume Lema?tre wrote: > I would be interested in helping for mentoring or whatever is needed > regarding the project. > > On 30 January 2017 at 21:25, Nelson Liu > wrote: > > Hey all, > I'd be willing to help out with mentoring a project as well, > hopefully in tandem with someone else. > > Nelson Liu > > On Mon, Jan 30, 2017 at 10:10 AM Jacob Schreiber > > wrote: > > I discussed this briefly with Gael and Joel. The consensus was > that unless we already know excellent students who will fit > well that it is unlikely we will participate in GSoC. That > being said, if someone (other than me) is willing to step up > and organize it, I'd volunteer to be a mentor again. I think > an important project would be adding multithreading to > individual tree building so we can do gradient boosting in > parallel. > > On Mon, Jan 30, 2017 at 5:38 AM, Andreas Mueller > > wrote: > > Hey all. > It's that time of the year again. > Are we planning on participating in GSOC? > If so, we need mentors and projects. > It's unlikely that I'll have time to help with either in > any substantial way. > If we want to participate, I think we should try to be a > bit more organized than last year ;) > > Andy > > Sent from phone. Please excuse spelling and brevity. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > -- > Guillaume Lemaitre > INRIA Saclay - Ile-de-France > Equipe PARIETAL > guillaume.lemaitre at inria.f r --- > https://glemaitre.github.io/ > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From akshay0724 at gmail.com Sat Feb 18 13:38:23 2017 From: akshay0724 at gmail.com (Akshay Gupta) Date: Sun, 19 Feb 2017 00:08:23 +0530 Subject: [scikit-learn] Regarding scikit learn to take part in GSOC 2017 Message-ID: Dear Programmers, I'm watching scikit learn on github from last few month and have also made some contribution. Just now I found that this year there is no final plan in community to take part in GSOC. Community like Scikit Learn which have a unique place in industry should promote open source contribution and must take part in events like GSOC. My appeal is that scikit learn should at least have a idea page.Though it is late but there still exist a chance to take part in GSOC. Regards Akshay GitHub Name - Akshay0724 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Sat Feb 18 13:54:05 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Sat, 18 Feb 2017 19:54:05 +0100 Subject: [scikit-learn] GSOC call for mentors In-Reply-To: References: Message-ID: Personally I don't feel like mentoring this year. I would really like to focus my scikit-learn time on finishing the joblib process refactoring with Thomas Moreau and the binning / thread-based parallelization of boosted trees with Guillaume and Raghav. -- Olivier From shubham.bhardwaj2015 at vit.ac.in Sat Feb 18 20:01:27 2017 From: shubham.bhardwaj2015 at vit.ac.in (SHUBHAM BHARDWAJ 15BCE0704) Date: Sun, 19 Feb 2017 06:31:27 +0530 Subject: [scikit-learn] can we have a slack team for scikit-learn Message-ID: Hello Friends, I have tried Slack and its awesome. Things are more dynamic. I have faced some problems which I am sure slack can alleviate like- When working on some issue if I need some guidance I am not sure when I will get reply. That maybe usually within 2-3 days or more.Maybe some fellow programmer who is free can discuss and we may find a good solution. I think collaboration would be much better. Regards Shubham Bhardwaj -------------- next part -------------- An HTML attachment was scrubbed... URL: From naopon at gmail.com Sat Feb 18 20:28:26 2017 From: naopon at gmail.com (Naoya Kanai) Date: Sat, 18 Feb 2017 17:28:26 -0800 Subject: [scikit-learn] can we have a slack team for scikit-learn In-Reply-To: References: Message-ID: The Gitter channel is occasionally active ( https://gitter.im/scikit-learn/scikit-learn) so you might want to check it out. On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 < shubham.bhardwaj2015 at vit.ac.in> wrote: > Hello Friends, > > I have tried Slack and its awesome. Things are more dynamic. I have faced > some problems which I am sure slack can alleviate like- > > When working on some issue if I need some guidance I am not sure when I > will get reply. That maybe usually within 2-3 days or more.Maybe some > fellow programmer who is free can discuss and we may find a good solution. > I think collaboration would be much better. > > Regards > Shubham Bhardwaj > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Sat Feb 18 20:44:38 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Sat, 18 Feb 2017 17:44:38 -0800 Subject: [scikit-learn] can we have a slack team for scikit-learn In-Reply-To: References: Message-ID: I would support a slack channel --if-- we had channels for different groups of modules, like a tree channel and a linear methods channel, and developers involved in those sections populated the channels. This would allow people to ask questions to developers involved directly. However, I can easily see this becoming yet another chat medium that is sparsely attended in which case it would be detrimental to split everyone's attention even further. On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai wrote: > The Gitter channel is occasionally active (https://gitter.im/scikit- > learn/scikit-learn) so you might want to check it out. > > On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 < > shubham.bhardwaj2015 at vit.ac.in> wrote: > >> Hello Friends, >> >> I have tried Slack and its awesome. Things are more dynamic. I have faced >> some problems which I am sure slack can alleviate like- >> >> When working on some issue if I need some guidance I am not sure when I >> will get reply. That maybe usually within 2-3 days or more.Maybe some >> fellow programmer who is free can discuss and we may find a good solution. >> I think collaboration would be much better. >> >> Regards >> Shubham Bhardwaj >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Sat Feb 18 20:45:55 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Sat, 18 Feb 2017 17:45:55 -0800 Subject: [scikit-learn] GSOC call for mentors In-Reply-To: References: Message-ID: I think we have de facto decided not to participate by not having someone step up by now and organize it like Raghav did last year. On Sat, Feb 18, 2017 at 10:54 AM, Olivier Grisel wrote: > Personally I don't feel like mentoring this year. I would really like > to focus my scikit-learn time on finishing the joblib process > refactoring with Thomas Moreau and the binning / thread-based > parallelization of boosted trees with Guillaume and Raghav. > > -- > Olivier > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Sat Feb 18 20:51:31 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Sat, 18 Feb 2017 17:51:31 -0800 Subject: [scikit-learn] Regarding scikit learn to take part in GSOC 2017 In-Reply-To: References: Message-ID: Hi Akshay Thanks for the note. We've had several threads discussing this, and appear to have come to the consensus that while there are some people who are willing to serve as mentors, no one has the time right now to organize the entire thing. The team always welcomes contributions and is willing to guide people seeking to merge nice pull requests. For me specifically, I'd still love to work with someone who is willing to parallelize single decision tree building, but I don't have the time myself to implement this now or go through the process to set up a GSoC. Jacob On Sat, Feb 18, 2017 at 10:38 AM, Akshay Gupta wrote: > Dear Programmers, > > I'm watching scikit learn on github from last few month and have also > made some contribution. Just now I found that this year there is no final > plan in community to take part in GSOC. > Community like Scikit Learn which have a unique place in industry should > promote open source contribution and must take part in events like GSOC. > > My appeal is that scikit learn should at least have a idea page.Though it > is late but there still exist a chance to take part in GSOC. > > Regards Akshay > > GitHub Name - Akshay0724 > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfliu at uw.edu Sat Feb 18 20:50:49 2017 From: nfliu at uw.edu (Nelson Liu) Date: Sat, 18 Feb 2017 17:50:49 -0800 Subject: [scikit-learn] can we have a slack team for scikit-learn In-Reply-To: References: Message-ID: > However, I can easily see this becoming yet another chat medium that is sparsely attended in which case it would be detrimental to split everyone's attention even further. I definitely agree with this and think that this would (likely) be the end outcome -- gitter and irc didn't/don't "work", so I'm pessimistic as to slack's chances. On Sat, Feb 18, 2017 at 5:44 PM, Jacob Schreiber wrote: > I would support a slack channel --if-- we had channels for different > groups of modules, like a tree channel and a linear methods channel, and > developers involved in those sections populated the channels. This would > allow people to ask questions to developers involved directly. However, I > can easily see this becoming yet another chat medium that is sparsely > attended in which case it would be detrimental to split everyone's > attention even further. > > On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai wrote: > >> The Gitter channel is occasionally active (https://gitter.im/scikit-lear >> n/scikit-learn) so you might want to check it out. >> >> On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 < >> shubham.bhardwaj2015 at vit.ac.in> wrote: >> >>> Hello Friends, >>> >>> I have tried Slack and its awesome. Things are more dynamic. I have >>> faced some problems which I am sure slack can alleviate like- >>> >>> When working on some issue if I need some guidance I am not sure when I >>> will get reply. That maybe usually within 2-3 days or more.Maybe some >>> fellow programmer who is free can discuss and we may find a good solution. >>> I think collaboration would be much better. >>> >>> Regards >>> Shubham Bhardwaj >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sun Feb 19 07:43:25 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 19 Feb 2017 23:43:25 +1100 Subject: [scikit-learn] GSOC call for mentors In-Reply-To: References: Message-ID: I am sure there are many people disappointed by the idea that we may not run with GSoC this year. On the one hand, we could ? as Ga?l has suggested ? really benefit from having more people involved in the maintenance of scikit-learn, and GSoC provides a potential pathway for newcomers. On the other hand, we have such an enormous quantity of PRs to review and decide upon already, that code is far from the main thing we are lacking in contribution; and, with notable exceptions, the active / long-term core devs have overwhelmingly not come in through the GSoC pathway. I also think we are at a stage of maturity where it is becoming relatively hard to design projects that are clearly beneficial, not going to create future maintenance burden, and can be performed by someone relatively new to developing scikit-learn. But as others have suggested on this list, there may be projects within the scikit-learn ecosystem that *do* need code and can clearly defined projects. I think if there were a clearly scoped project and a promising student, we would find mentor availability. But the core devs have not had capacity to design a project within the above constraints, and no student has come forward with a clear proposition. Potential students must recognise that GSoC funding assumes, essentially, in-kind contributions from mentors in time. Since we're mostly relying here on volunteers, how readily we can afford that contribution needs to be rationalised. On 19 February 2017 at 12:45, Jacob Schreiber wrote: > I think we have de facto decided not to participate by not having someone > step up by now and organize it like Raghav did last year. > > On Sat, Feb 18, 2017 at 10:54 AM, Olivier Grisel > wrote: > >> Personally I don't feel like mentoring this year. I would really like >> to focus my scikit-learn time on finishing the joblib process >> refactoring with Thomas Moreau and the binning / thread-based >> parallelization of boosted trees with Guillaume and Raghav. >> >> -- >> Olivier >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Feb 19 12:08:15 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 19 Feb 2017 18:08:15 +0100 Subject: [scikit-learn] can we have a slack team for scikit-learn In-Reply-To: References: Message-ID: <20170219170815.GA2499052@phare.normalesup.org> I agree: the limiting factor is everybody's time. Technology doesn't help much in this respect. I am afraid that if we add a slack channel, we are just going to get much dilution. I don't see what killer feature that slack has that would suddenly make it that knowledgeable people have more time. In order to make the best use of my personnal time, I must admit that I stick to a rule: github is where I spend my time. Ga?l On Sat, Feb 18, 2017 at 05:50:49PM -0800, Nelson Liu wrote: > >??However, I can easily see this becoming yet another chat medium that is > sparsely attended in which case it would be detrimental to split everyone's > attention even further.? > I definitely agree with this and think that this would (likely) be the end > outcome -- gitter and irc didn't/don't "work", so I'm pessimistic as to slack's > chances. > On Sat, Feb 18, 2017 at 5:44 PM, Jacob Schreiber > wrote: > I would support a slack channel --if-- we had channels for different groups > of modules, like a tree channel and a linear methods channel, and > developers involved in those sections populated the channels. This would > allow people to ask questions to developers involved directly. However, I > can easily see this becoming yet another chat medium that is sparsely > attended in which case it would be detrimental to split everyone's > attention even further.? > On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai wrote: > The Gitter channel is occasionally active (https://gitter.im/ > scikit-learn/scikit-learn) so you might want to check it out. > On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 < > shubham.bhardwaj2015 at vit.ac.in> wrote: > Hello Friends, > I have tried Slack and its awesome. Things are more dynamic. I have > faced some problems which I am sure slack can alleviate like- > When working on some issue if I need some guidance I am not sure > when I will get ?reply. That maybe usually within 2-3 days or > more.Maybe some fellow programmer who is free can discuss and we > may find a good solution. I think collaboration would be much > better. > Regards > Shubham Bhardwaj > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From se.raschka at gmail.com Sun Feb 19 13:15:45 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Sun, 19 Feb 2017 13:15:45 -0500 Subject: [scikit-learn] can we have a slack team for scikit-learn In-Reply-To: <20170219170815.GA2499052@phare.normalesup.org> References: <20170219170815.GA2499052@phare.normalesup.org> Message-ID: <64F9320E-F7BD-4144-8AC2-F2BAA8DDCF0F@gmail.com> In my opinion, Slack can be quite useful for discussing things ?live.? However, one of the main problems I have with Slack ? I am using it for some other projects ? is that it is easy to lose track if important things are discussed and one is not constantly online and checking the timeline. In any case, I think Slack would be the same as using the already existing Gitter channel ? the only difference in my view would be that Slack, as a brand, is more popular for some reason. I think Slack/Gitter could be useful for sprints though, augmenting GitHub. > On Feb 19, 2017, at 12:08 PM, Gael Varoquaux wrote: > > I agree: the limiting factor is everybody's time. Technology doesn't help > much in this respect. I am afraid that if we add a slack channel, we are > just going to get much dilution. I don't see what killer feature that > slack has that would suddenly make it that knowledgeable people have more > time. > > In order to make the best use of my personnal time, I must admit that I > stick to a rule: github is where I spend my time. > > Ga?l > > On Sat, Feb 18, 2017 at 05:50:49PM -0800, Nelson Liu wrote: >>> However, I can easily see this becoming yet another chat medium that is >> sparsely attended in which case it would be detrimental to split everyone's >> attention even further. >> I definitely agree with this and think that this would (likely) be the end >> outcome -- gitter and irc didn't/don't "work", so I'm pessimistic as to slack's >> chances. > >> On Sat, Feb 18, 2017 at 5:44 PM, Jacob Schreiber >> wrote: > >> I would support a slack channel --if-- we had channels for different groups >> of modules, like a tree channel and a linear methods channel, and >> developers involved in those sections populated the channels. This would >> allow people to ask questions to developers involved directly. However, I >> can easily see this becoming yet another chat medium that is sparsely >> attended in which case it would be detrimental to split everyone's >> attention even further. > >> On Sat, Feb 18, 2017 at 5:28 PM, Naoya Kanai wrote: > >> The Gitter channel is occasionally active (https://gitter.im/ >> scikit-learn/scikit-learn) so you might want to check it out. > >> On Sat, Feb 18, 2017 at 5:01 PM, SHUBHAM BHARDWAJ 15BCE0704 < >> shubham.bhardwaj2015 at vit.ac.in> wrote: > >> Hello Friends, > >> I have tried Slack and its awesome. Things are more dynamic. I have >> faced some problems which I am sure slack can alleviate like- > >> When working on some issue if I need some guidance I am not sure >> when I will get reply. That maybe usually within 2-3 days or >> more.Maybe some fellow programmer who is free can discuss and we >> may find a good solution. I think collaboration would be much >> better. > >> Regards >> Shubham Bhardwaj > > > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From vincent.dubourg at gmail.com Mon Feb 20 03:21:58 2017 From: vincent.dubourg at gmail.com (Vincent Dubourg) Date: Mon, 20 Feb 2017 09:21:58 +0100 Subject: [scikit-learn] Confidence intervals on GaussianProcessRegressor hyperparameters estimates Message-ID: Hi list, Did anyone ever considered using the Cramer-Rao lower bound estimate for the variance-covariance matrix of the GaussianProcess hyperparameters estimate? I have seen the gradient of the marginal log likelihood is already available. How about the hessian matrix? Looking at the theta values wrt to the one in the fitted kernel themselves, it looks like some normalization occurs, which is fine, though how do I get the true gradient back? Actually, I am more interested in infering the parameters rather than predicting. I have considered using pymc3 but MCMC is quite time expensive and I would like to be able to speed this with a reasonable approximation. George is also an alternative but out of the question since I am running Windows. Thank you, Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 21 09:18:29 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 21 Feb 2017 09:18:29 -0500 Subject: [scikit-learn] can we have a slack team for scikit-learn In-Reply-To: <64F9320E-F7BD-4144-8AC2-F2BAA8DDCF0F@gmail.com> References: <20170219170815.GA2499052@phare.normalesup.org> <64F9320E-F7BD-4144-8AC2-F2BAA8DDCF0F@gmail.com> Message-ID: <9aa6e4dd-3ee8-6483-2730-077b844c3d79@gmail.com> I agree with the rest. How would slack channels be different from the existing gitter channels? From t3kcit at gmail.com Tue Feb 21 09:22:18 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 21 Feb 2017 09:22:18 -0500 Subject: [scikit-learn] Request for single pass clustering algorithm In-Reply-To: References: Message-ID: <4be9f13b-fbbe-a33b-c5fa-2bfc30ad0987@gmail.com> Hi Gaurish. scikit-learn-owner is the email address for the mailing list administration. See the FAQ on contacting the project: http://scikit-learn.org/stable/faq.html#what-s-the-best-way-to-get-help-on-scikit-learn-usage For feature requests I would suggest the issue tracker. This sounds like "single pass" is the same as agglomerative clustering with the average linkage criterion. Or is it any different from that? Andy On 02/21/2017 04:39 AM, Gaurish Thakkar wrote: > > I would like to make a request to Scikit team to please implement and > incorporate the single pass clustering algorithm. It is one of the > most basic online algorithms and > the link below dicusses the process in detail. > [http://facweb.cs.depaul.edu/mobasher/classes/csc575/assignments/single-pass.html] > > -- > /Regards:/ > Gaurish P Thakkar > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 21 09:24:54 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 21 Feb 2017 09:24:54 -0500 Subject: [scikit-learn] GSOC call for mentors In-Reply-To: References: Message-ID: I agree, I just wanted to make sure we are on the same page, and that we can tell people "no we're not gonna do GSoC" instead of "err I don't know what's happening, maybe not?" From t3kcit at gmail.com Tue Feb 21 09:52:31 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 21 Feb 2017 09:52:31 -0500 Subject: [scikit-learn] Scipy 2017 Message-ID: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com> Hey folks. Who's coming to scipy this year? Any volunteers for tutorials? I'm happy to be part of it but doing 7h by myself is a bit much ;) Andy From jeff1evesque at yahoo.com Tue Feb 21 10:07:00 2017 From: jeff1evesque at yahoo.com (Jeffrey Levesque) Date: Tue, 21 Feb 2017 10:07:00 -0500 Subject: [scikit-learn] GSOC call for mentors In-Reply-To: References: Message-ID: <29E6DB6D-188C-406A-967F-670E6C10D3E6@yahoo.com> Hey guys, Maybe you guys could redirect some of them to related scikit-learn projects? For example, my project intends to interface scikit-learn: - https://github.com/jeff1evesque/machine-learning Even though it's a lot of JavaScript (web-interface), and puppet scripts for automating the build, I will need some help getting the python backend logic to correctly snap in to scikit-learns utilities. If some of you want to assist me mentor (I know some of you wanted to mentor), since you guys are scikit-learn developers, that would be hugely helpful. Even though individuals may not be creating new features (new algorithms, or optimizing) perse, they could assist me interfacing existing scikit-learn utilities, by writing corresponding Python logic which would properly delegate datasets into corresponding databases, and such. This would largely make sklearn utilities available to a web-interface, as well as a server API - at least my intention. My python syntax is the prettiest, so I welcome anyone to help improve it - since, this is largely a home pet project, and sometimes I only have 1-2 hours a day to work on this project. Thank you, Jeff Levesque https://github.com/jeff1evesque > On Feb 21, 2017, at 9:24 AM, Andreas Mueller wrote: > > I agree, I just wanted to make sure we are on the same page, and that we can tell people "no we're not gonna do GSoC" > instead of "err I don't know what's happening, maybe not?" > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn Thank you, Jeff Levesque https://github.com/jeff1evesque > On Feb 21, 2017, at 9:24 AM, Andreas Mueller wrote: > > I agree, I just wanted to make sure we are on the same page, and that we can tell people "no we're not gonna do GSoC" > instead of "err I don't know what's happening, maybe not?" > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 21 11:31:43 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 21 Feb 2017 11:31:43 -0500 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References: <20170109151546.GM2802991@phare.normalesup.org> <20170111215115.GO1585067@phare.normalesup.org> Message-ID: On 02/07/2017 09:00 PM, Joel Nothman wrote: > On 12 January 2017 at 08:51, Gael Varoquaux > > > wrote: > > On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote: > > When the two versions deprecation policy was instituted, > releases were much > > more frequent... Is that enough of an excuse? > > I'd rather say that we can here decide that we are giving a longer > grace > period. > > I think that slow deprecations are a good things (see titus's blog > post > here: > http://ivory.idyll.org/blog/2017-pof-software-archivability.html > ) > > Given that 0.18 was a very slow release, and the work for removing > deprecated material from 0.19 has already been done, I don't think we > should revert that. I agree that we can delay the deprecation deadline > for 0.20 and 0.21. > > In terms of release schedule, are we aiming for RC in early-mid March, > assuming Andy's above prognostications are correct and he is able to > review in a bigger way in a week or so? > Sometimes I wonder how Amazon ever gave me a job in forecasting.... Spring break is March 13-17th ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon Feb 27 05:58:35 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 27 Feb 2017 11:58:35 +0100 Subject: [scikit-learn] GSoC 2017 Message-ID: <20170227105835.GC2041043@phare.normalesup.org> Hi, Students have been inquiring about the GSoC (Google Summer of Code) with scikit-learn, and the core team has been quite silent about team. I am happy to announce that we will be taking part in the scikit-learn again. The reason that we decided to do this is to give a chance to the young, talented, and motivated students. Importantly, our most limiting resource is the time of our experienced developers. This is clearly visible from the number of pending pull requests. Hence, we need students to be very able and independent. This of course means that they will be getting supervision from mentors. Such supervision is crucial for moving forward with a good project, that delivers mergeable code. However, we will need the students to be very good at interacting efficiently with the mentors. Also, I should stress that we will be able to take only a very few numbers of students. With that said, let me introduce the 2017 GSoC for scikit-learn. We have set up a wiki page which summarizes the experiences from last year and the ideas for this year: https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017 Interested students should declare their interest on the mailing list, and discuss with possible mentors here. Factors of success will be * careful work on a good proposal, that takes on of the ideas on the wiki but breaks it down in a realistic plan with multiple steps and shows a good understanding of the problem. * demonstration of the required skillset via successful pull requests in scikit-learn. Cheers, Ga?l -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From ludo25_90 at hotmail.com Mon Feb 27 09:27:59 2017 From: ludo25_90 at hotmail.com (Ludovico Coletta) Date: Mon, 27 Feb 2017 14:27:59 +0000 Subject: [scikit-learn] Control over the inner loop in GridSearchCV Message-ID: Dear Scikit experts, we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: pipeline = Pipeline([('scl', StandardScaler()), ('sel', RFE(estimator,step=0.2)), ('clf', SVC(probability=True, random_state=42))]) param_grid = [{'sel__n_features_to_select':[22,15,10,2], 'clf__C': np.logspace(-3, 5, 100), 'clf__kernel':['linear']}] clf = GridSearchCV(pipeline, param_grid=param_grid, verbose=1, scoring='roc_auc', n_jobs= -1) # cv_final is the custom cv for the outer loop (9 folds) ii = 0 while ii < len(cv_final): # fit and predict clf.fit(data[?]], y[[?]]) predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data ii = ii + 1 We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. Two questions: 1) Is there any workaround to avoid the split when clf is called without a cv argument? 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly Thank your for your time and sorry for the long text Ludovico -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Mon Feb 27 11:27:24 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Mon, 27 Feb 2017 11:27:24 -0500 Subject: [scikit-learn] Control over the inner loop in GridSearchCV In-Reply-To: References: Message-ID: Hi, Ludovico, what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g., skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1) for outer_train_idx, outer_valid_idx in skfold: ? gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx]) > > On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > Two questions: Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.: outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1) for name, gs_est in sorted(gridcvs.items()): nested_score = cross_val_score(gs_est, X=X_train, y=y_train, cv=outer_cv, n_jobs=1) Best, Sebastian > On Feb 27, 2017, at 9:27 AM, Ludovico Coletta wrote: > > Dear Scikit experts, > > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. > > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. > > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: > > pipeline = Pipeline([('scl', StandardScaler()), > ('sel', RFE(estimator,step=0.2)), > ('clf', SVC(probability=True, random_state=42))]) > > > param_grid = [{'sel__n_features_to_select':[22,15,10,2], > 'clf__C': np.logspace(-3, 5, 100), > 'clf__kernel':['linear']}] > > clf = GridSearchCV(pipeline, > param_grid=param_grid, > verbose=1, > scoring='roc_auc', > n_jobs= -1) > > # cv_final is the custom cv for the outer loop (9 folds) > > ii = 0 > > while ii < len(cv_final): > # fit and predict > > clf.fit(data[?]], y[[?]]) > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data > ii = ii + 1 > > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > Two questions: > 1) Is there any workaround to avoid the split when clf is called without a cv argument? > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly > > Thank your for your time and sorry for the long text > Ludovico > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From nfliu at uw.edu Mon Feb 27 12:46:25 2017 From: nfliu at uw.edu (Nelson Liu) Date: Mon, 27 Feb 2017 09:46:25 -0800 Subject: [scikit-learn] GSoC 2017 In-Reply-To: <20170227105835.GC2041043@phare.normalesup.org> References: <20170227105835.GC2041043@phare.normalesup.org> Message-ID: In past years students made a page on the wiki with their proposal; this isn't possible anymore due to GitHub permissions. Perhaps an alternative method for getting feedback should be suggested on the introduction page? Nelson Liu On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > Hi, > > Students have been inquiring about the GSoC (Google Summer of Code) with > scikit-learn, and the core team has been quite silent about team. > > I am happy to announce that we will be taking part in the scikit-learn > again. The reason that we decided to do this is to give a chance to the > young, talented, and motivated students. > > Importantly, our most limiting resource is the time of our experienced > developers. This is clearly visible from the number of pending pull > requests. Hence, we need students to be very able and independent. This > of course means that they will be getting supervision from mentors. Such > supervision is crucial for moving forward with a good project, that > delivers mergeable code. However, we will need the students to be very > good at interacting efficiently with the mentors. Also, I should stress > that we will be able to take only a very few numbers of students. > > With that said, let me introduce the 2017 GSoC for scikit-learn. We have > set up a wiki page which summarizes the experiences from last year and > the ideas for this year: > https://github.com/scikit-learn/scikit-learn/wiki/Google- > summer-of-code-(GSOC)-2017 > > Interested students should declare their interest on the mailing list, > and discuss with possible mentors here. Factors of success will be > > * careful work on a good proposal, that takes on of the ideas on the wiki > but breaks it down in a realistic plan with multiple steps and shows a > good understanding of the problem. > > * demonstration of the required skillset via successful pull requests in > scikit-learn. > > Cheers, > > Ga?l > > > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ragvrv at gmail.com Mon Feb 27 14:28:03 2017 From: ragvrv at gmail.com (Raghav R V) Date: Mon, 27 Feb 2017 20:28:03 +0100 Subject: [scikit-learn] GSoC 2017 In-Reply-To: References: <20170227105835.GC2041043@phare.normalesup.org> Message-ID: They can still edit a wiki page from their fork of scikit learn I think. So I'd suggest doing that and mailing to this thread, the link to their proposal... On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: > In past years students made a page on the wiki with their proposal; this > isn't possible anymore due to GitHub permissions. Perhaps an alternative > method for getting feedback should be suggested on the introduction page? > > Nelson Liu > > On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > >> Hi, >> >> Students have been inquiring about the GSoC (Google Summer of Code) with >> scikit-learn, and the core team has been quite silent about team. >> >> I am happy to announce that we will be taking part in the scikit-learn >> again. The reason that we decided to do this is to give a chance to the >> young, talented, and motivated students. >> >> Importantly, our most limiting resource is the time of our experienced >> developers. This is clearly visible from the number of pending pull >> requests. Hence, we need students to be very able and independent. This >> of course means that they will be getting supervision from mentors. Such >> supervision is crucial for moving forward with a good project, that >> delivers mergeable code. However, we will need the students to be very >> good at interacting efficiently with the mentors. Also, I should stress >> that we will be able to take only a very few numbers of students. >> >> With that said, let me introduce the 2017 GSoC for scikit-learn. We have >> set up a wiki page which summarizes the experiences from last year and >> the ideas for this year: >> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum >> mer-of-code-(GSOC)-2017 >> >> Interested students should declare their interest on the mailing list, >> and discuss with possible mentors here. Factors of success will be >> >> * careful work on a good proposal, that takes on of the ideas on the wiki >> but breaks it down in a realistic plan with multiple steps and shows a >> good understanding of the problem. >> >> * demonstration of the required skillset via successful pull requests in >> scikit-learn. >> >> Cheers, >> >> Ga?l >> >> >> -- >> Gael Varoquaux >> Researcher, INRIA Parietal >> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >> Phone: ++ 33-1-69-08-79-68 >> http://gael-varoquaux.info http://twitter.com/GaelVaroqua >> ux >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ragvrv at gmail.com Mon Feb 27 14:29:06 2017 From: ragvrv at gmail.com (Raghav R V) Date: Mon, 27 Feb 2017 20:29:06 +0100 Subject: [scikit-learn] GSoC 2017 In-Reply-To: References: <20170227105835.GC2041043@phare.normalesup.org> Message-ID: Or simply a public gist and importantly the link mailed here would do I think... On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: > They can still edit a wiki page from their fork of scikit learn I think. > So I'd suggest doing that and mailing to this thread, the link to their > proposal... > > On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: > >> In past years students made a page on the wiki with their proposal; this >> isn't possible anymore due to GitHub permissions. Perhaps an alternative >> method for getting feedback should be suggested on the introduction page? >> >> Nelson Liu >> >> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >> gael.varoquaux at normalesup.org> wrote: >> >>> Hi, >>> >>> Students have been inquiring about the GSoC (Google Summer of Code) with >>> scikit-learn, and the core team has been quite silent about team. >>> >>> I am happy to announce that we will be taking part in the scikit-learn >>> again. The reason that we decided to do this is to give a chance to the >>> young, talented, and motivated students. >>> >>> Importantly, our most limiting resource is the time of our experienced >>> developers. This is clearly visible from the number of pending pull >>> requests. Hence, we need students to be very able and independent. This >>> of course means that they will be getting supervision from mentors. Such >>> supervision is crucial for moving forward with a good project, that >>> delivers mergeable code. However, we will need the students to be very >>> good at interacting efficiently with the mentors. Also, I should stress >>> that we will be able to take only a very few numbers of students. >>> >>> With that said, let me introduce the 2017 GSoC for scikit-learn. We have >>> set up a wiki page which summarizes the experiences from last year and >>> the ideas for this year: >>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum >>> mer-of-code-(GSOC)-2017 >>> >>> Interested students should declare their interest on the mailing list, >>> and discuss with possible mentors here. Factors of success will be >>> >>> * careful work on a good proposal, that takes on of the ideas on the wiki >>> but breaks it down in a realistic plan with multiple steps and shows a >>> good understanding of the problem. >>> >>> * demonstration of the required skillset via successful pull requests in >>> scikit-learn. >>> >>> Cheers, >>> >>> Ga?l >>> >>> >>> -- >>> Gael Varoquaux >>> Researcher, INRIA Parietal >>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>> Phone: ++ 33-1-69-08-79-68 >>> http://gael-varoquaux.info http://twitter.com/GaelVaroqua >>> ux >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thalasta at usc.edu Mon Feb 27 14:53:30 2017 From: thalasta at usc.edu (Pradeep Thalasta) Date: Mon, 27 Feb 2017 11:53:30 -0800 Subject: [scikit-learn] GSoC 2017 In-Reply-To: References: <20170227105835.GC2041043@phare.normalesup.org> Message-ID: Hi, I'm new to open source contribution. Can i take part in GSoc as well? On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V wrote: > Or simply a public gist and importantly the link mailed here would do I > think... > > On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: > >> They can still edit a wiki page from their fork of scikit learn I think. >> So I'd suggest doing that and mailing to this thread, the link to their >> proposal... >> >> On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: >> >>> In past years students made a page on the wiki with their proposal; this >>> isn't possible anymore due to GitHub permissions. Perhaps an alternative >>> method for getting feedback should be suggested on the introduction page? >>> >>> Nelson Liu >>> >>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >>> gael.varoquaux at normalesup.org> wrote: >>> >>>> Hi, >>>> >>>> Students have been inquiring about the GSoC (Google Summer of Code) with >>>> scikit-learn, and the core team has been quite silent about team. >>>> >>>> I am happy to announce that we will be taking part in the scikit-learn >>>> again. The reason that we decided to do this is to give a chance to the >>>> young, talented, and motivated students. >>>> >>>> Importantly, our most limiting resource is the time of our experienced >>>> developers. This is clearly visible from the number of pending pull >>>> requests. Hence, we need students to be very able and independent. This >>>> of course means that they will be getting supervision from mentors. Such >>>> supervision is crucial for moving forward with a good project, that >>>> delivers mergeable code. However, we will need the students to be very >>>> good at interacting efficiently with the mentors. Also, I should stress >>>> that we will be able to take only a very few numbers of students. >>>> >>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We have >>>> set up a wiki page which summarizes the experiences from last year and >>>> the ideas for this year: >>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum >>>> mer-of-code-(GSOC)-2017 >>>> >>>> >>>> Interested students should declare their interest on the mailing list, >>>> and discuss with possible mentors here. Factors of success will be >>>> >>>> * careful work on a good proposal, that takes on of the ideas on the >>>> wiki >>>> but breaks it down in a realistic plan with multiple steps and shows a >>>> good understanding of the problem. >>>> >>>> * demonstration of the required skillset via successful pull requests in >>>> scikit-learn. >>>> >>>> Cheers, >>>> >>>> Ga?l >>>> >>>> >>>> -- >>>> Gael Varoquaux >>>> Researcher, INRIA Parietal >>>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>> Phone: ++ 33-1-69-08-79-68 >>>> http://gael-varoquaux.info >>>> >>>> http://twitter.com/GaelVaroquaux >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail. > python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c= > clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN- > jbuYw7VyipS2uLHiQg&m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s= > 2HaUcj6htbntv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e= > > -- Regards, Pradeep Thalasta -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at telecom-paristech.fr Mon Feb 27 16:20:40 2017 From: alexandre.gramfort at telecom-paristech.fr (Alexandre Gramfort) Date: Mon, 27 Feb 2017 22:20:40 +0100 Subject: [scikit-learn] Scipy 2017 In-Reply-To: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com> References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com> Message-ID: Hi Andy, I'll be happy to share the stage with you for a tutorial. Alex On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller wrote: > Hey folks. > Who's coming to scipy this year? > Any volunteers for tutorials? I'm happy to be part of it but doing 7h by > myself is a bit much ;) > > > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From ludo25_90 at hotmail.com Mon Feb 27 17:13:04 2017 From: ludo25_90 at hotmail.com (Ludovico Coletta) Date: Mon, 27 Feb 2017 22:13:04 +0000 Subject: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29 In-Reply-To: References: Message-ID: Dear Sebastian, thank you for the quick answer. The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1 I saw that I wrote something wrong in previous email. Your solution is indeed correct if we leave Scikit decide how to manage the inner loop. This is what we did at the beginning. By doing so, we noticed that the classifier's perfomance decrease (in comparison to a non-optimised classifier). We would like to control the inner split and we need to store the metrics for each fold The way we obtained the indices for the optimization, train and test phase is the equivalent of something like that: rs = ShuffleSplit(n_splits=9, test_size=.25,random_state=42) indices_for_each_cv = list(rs.split(data[0:11])) Maybe I can make myself clearer if I write what we would like to achieve for the first cross validation fold (I acknowledge that the previous email was quite a mess, sorry). Outer loop: 48 for training, 20 for testing. Of the 48 training subjects, we would like to use 42 for optimization, 6 for testing the parameters. We got the indices so that we match the different scanners even in the optimization phase, but we are not able to pass them to GridSearch object. The following did not work. This is what we get --> ValueError: too many values to unpack ii = 0 while ii < len(cv_final): # fit and predict clf = GridSearchCV( pipeline, param_grid=param_grid, verbose=1, cv = cv_final_nested[ii], # how to split the 48 train subjects for the optimization scoring='roc_auc', n_jobs= -1) clf.fit(data[cv_final[ii][0]], y[cv_final[ii][0]]) # the train data of the outer loop for the first (i.e. the 48 subjects) predictions.append(clf.predict(data[cv_final[ii][1]])) # Predict the 20 subjects left out for test in the outer loop ii = ii + 1 This however works and should be (more or less) what we would like to achieve with the above loop. However, extracting the best parameters for each fold in order to predict the left out data seems impossible or very laborious. clf = GridSearchCV( pipeline, param_grid=param_grid, verbose=1, cv = cv_final_nested, scoring='roc_auc', n_jobs= -1) clf.fit(data,y) Any hint on how to solve this problem would be really appreciated. Best Ludovico ________________________________ Da: scikit-learn per conto di scikit-learn-request at python.org Inviato: luned? 27 febbraio 2017 17.27 A: scikit-learn at python.org Oggetto: scikit-learn Digest, Vol 11, Issue 29 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. GSoC 2017 (Gael Varoquaux) 2. Control over the inner loop in GridSearchCV (Ludovico Coletta) 3. Re: Control over the inner loop in GridSearchCV (Sebastian Raschka) ---------------------------------------------------------------------- Message: 1 Date: Mon, 27 Feb 2017 11:58:35 +0100 From: Gael Varoquaux To: Scikit-learn user and developer mailing list Subject: [scikit-learn] GSoC 2017 Message-ID: <20170227105835.GC2041043 at phare.normalesup.org> Content-Type: text/plain; charset=iso-8859-1 Hi, Students have been inquiring about the GSoC (Google Summer of Code) with scikit-learn, and the core team has been quite silent about team. I am happy to announce that we will be taking part in the scikit-learn again. The reason that we decided to do this is to give a chance to the young, talented, and motivated students. Importantly, our most limiting resource is the time of our experienced developers. This is clearly visible from the number of pending pull requests. Hence, we need students to be very able and independent. This of course means that they will be getting supervision from mentors. Such supervision is crucial for moving forward with a good project, that delivers mergeable code. However, we will need the students to be very good at interacting efficiently with the mentors. Also, I should stress that we will be able to take only a very few numbers of students. With that said, let me introduce the 2017 GSoC for scikit-learn. We have set up a wiki page which summarizes the experiences from last year and the ideas for this year: https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017 Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub github.com scikit-learn: machine learning in Python Interested students should declare their interest on the mailing list, and discuss with possible mentors here. Factors of success will be * careful work on a good proposal, that takes on of the ideas on the wiki but breaks it down in a realistic plan with multiple steps and shows a good understanding of the problem. * demonstration of the required skillset via successful pull requests in scikit-learn. Cheers, Ga?l -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux Gael Varoquaux (@GaelVaroquaux) | Twitter twitter.com The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France Ga?l Varoquaux: computer / data / brain science gael-varoquaux.info Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights ------------------------------ Message: 2 Date: Mon, 27 Feb 2017 14:27:59 +0000 From: Ludovico Coletta To: "scikit-learn at python.org" Subject: [scikit-learn] Control over the inner loop in GridSearchCV Message-ID: Content-Type: text/plain; charset="iso-8859-1" Dear Scikit experts, we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: pipeline = Pipeline([('scl', StandardScaler()), ('sel', RFE(estimator,step=0.2)), ('clf', SVC(probability=True, random_state=42))]) param_grid = [{'sel__n_features_to_select':[22,15,10,2], 'clf__C': np.logspace(-3, 5, 100), 'clf__kernel':['linear']}] clf = GridSearchCV(pipeline, param_grid=param_grid, verbose=1, scoring='roc_auc', n_jobs= -1) # cv_final is the custom cv for the outer loop (9 folds) ii = 0 while ii < len(cv_final): # fit and predict clf.fit(data[?]], y[[?]]) predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data ii = ii + 1 We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. Two questions: 1) Is there any workaround to avoid the split when clf is called without a cv argument? 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly Thank your for your time and sorry for the long text Ludovico -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 27 Feb 2017 11:27:24 -0500 From: Sebastian Raschka To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Control over the inner loop in GridSearchCV Message-ID: Content-Type: text/plain; charset=utf-8 Hi, Ludovico, what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g., skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1) for outer_train_idx, outer_valid_idx in skfold: ? gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx]) > > On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > Two questions: Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.: outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1) for name, gs_est in sorted(gridcvs.items()): nested_score = cross_val_score(gs_est, X=X_train, y=y_train, cv=outer_cv, n_jobs=1) Best, Sebastian > On Feb 27, 2017, at 9:27 AM, Ludovico Coletta wrote: > > Dear Scikit experts, > > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. > > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. > > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: > > pipeline = Pipeline([('scl', StandardScaler()), > ('sel', RFE(estimator,step=0.2)), > ('clf', SVC(probability=True, random_state=42))]) > > > param_grid = [{'sel__n_features_to_select':[22,15,10,2], > 'clf__C': np.logspace(-3, 5, 100), > 'clf__kernel':['linear']}] > > clf = GridSearchCV(pipeline, > param_grid=param_grid, > verbose=1, > scoring='roc_auc', > n_jobs= -1) > > # cv_final is the custom cv for the outer loop (9 folds) > > ii = 0 > > while ii < len(cv_final): > # fit and predict > > clf.fit(data[?]], y[[?]]) > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data > ii = ii + 1 > > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > Two questions: > 1) Is there any workaround to avoid the split when clf is called without a cv argument? > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly > > Thank your for your time and sorry for the long text > Ludovico > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... ------------------------------ End of scikit-learn Digest, Vol 11, Issue 29 ******************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon Feb 27 17:19:33 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 27 Feb 2017 23:19:33 +0100 Subject: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29 In-Reply-To: References: Message-ID: <20170227221933.GC2369856@phare.normalesup.org> On Mon, Feb 27, 2017 at 10:13:04PM +0000, Ludovico Coletta wrote: > The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1 > I saw that I wrote something wrong in previous email. Your solution is indeed > correct if we leave Scikit decide how to manage the inner loop. This is what we > did at the beginning. By doing so, we noticed that the classifier's perfomance > decrease (in comparison to a non-optimised classifier). With 68 samples, it is not that surprising the model-selection with cross-validation is not able to select a good model. We found the same problem in brain imaging data [1], and it's an intrinsic problem due to small sample sizes: cross-validation is just not very accurate in these settings. Ga?l [1] https://arxiv.org/abs/1606.05201 From joel.nothman at gmail.com Mon Feb 27 17:34:43 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 28 Feb 2017 09:34:43 +1100 Subject: [scikit-learn] GSoC 2017 In-Reply-To: References: <20170227105835.GC2041043@phare.normalesup.org> Message-ID: Hi Pradeep, we would usually only accept candidates who have shown their proficiency and understanding of our package and processes by making some contributions prior to this stage. you are certainly welcome to aim for GSoC 2018 by beginning to develop your familiarity and rapport now. cheers, Joel On 28 Feb 2017 7:01 am, "Pradeep Thalasta" wrote: > Hi, > I'm new to open source contribution. Can i take part in GSoc as well? > > > On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V wrote: > >> Or simply a public gist and importantly the link mailed here would do I >> think... >> >> On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: >> >>> They can still edit a wiki page from their fork of scikit learn I think. >>> So I'd suggest doing that and mailing to this thread, the link to their >>> proposal... >>> >>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: >>> >>>> In past years students made a page on the wiki with their proposal; >>>> this isn't possible anymore due to GitHub permissions. Perhaps an >>>> alternative method for getting feedback should be suggested on the >>>> introduction page? >>>> >>>> Nelson Liu >>>> >>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >>>> gael.varoquaux at normalesup.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Students have been inquiring about the GSoC (Google Summer of Code) >>>>> with >>>>> scikit-learn, and the core team has been quite silent about team. >>>>> >>>>> I am happy to announce that we will be taking part in the scikit-learn >>>>> again. The reason that we decided to do this is to give a chance to the >>>>> young, talented, and motivated students. >>>>> >>>>> Importantly, our most limiting resource is the time of our experienced >>>>> developers. This is clearly visible from the number of pending pull >>>>> requests. Hence, we need students to be very able and independent. This >>>>> of course means that they will be getting supervision from mentors. >>>>> Such >>>>> supervision is crucial for moving forward with a good project, that >>>>> delivers mergeable code. However, we will need the students to be very >>>>> good at interacting efficiently with the mentors. Also, I should stress >>>>> that we will be able to take only a very few numbers of students. >>>>> >>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We >>>>> have >>>>> set up a wiki page which summarizes the experiences from last year and >>>>> the ideas for this year: >>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum >>>>> mer-of-code-(GSOC)-2017 >>>>> >>>>> >>>>> Interested students should declare their interest on the mailing list, >>>>> and discuss with possible mentors here. Factors of success will be >>>>> >>>>> * careful work on a good proposal, that takes on of the ideas on the >>>>> wiki >>>>> but breaks it down in a realistic plan with multiple steps and shows >>>>> a >>>>> good understanding of the problem. >>>>> >>>>> * demonstration of the required skillset via successful pull requests >>>>> in >>>>> scikit-learn. >>>>> >>>>> Cheers, >>>>> >>>>> Ga?l >>>>> >>>>> >>>>> -- >>>>> Gael Varoquaux >>>>> Researcher, INRIA Parietal >>>>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>>> Phone: ++ 33-1-69-08-79-68 >>>>> http://gael-varoquaux.info >>>>> >>>>> http://twitter.com/GaelVaroquaux >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py >> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT >> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg >> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn >> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e= >> >> > > > -- > Regards, > Pradeep Thalasta > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thalasta at usc.edu Mon Feb 27 17:46:36 2017 From: thalasta at usc.edu (Pradeep Thalasta) Date: Mon, 27 Feb 2017 14:46:36 -0800 Subject: [scikit-learn] GSoC 2017 In-Reply-To: References: <20170227105835.GC2041043@phare.normalesup.org> Message-ID: Thanks Joel, will start with the contribution soon. On 27 Feb 2017 2:35 pm, "Joel Nothman" wrote: Hi Pradeep, we would usually only accept candidates who have shown their proficiency and understanding of our package and processes by making some contributions prior to this stage. you are certainly welcome to aim for GSoC 2018 by beginning to develop your familiarity and rapport now. cheers, Joel On 28 Feb 2017 7:01 am, "Pradeep Thalasta" wrote: > Hi, > I'm new to open source contribution. Can i take part in GSoc as well? > > > On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V wrote: > >> Or simply a public gist and importantly the link mailed here would do I >> think... >> >> On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: >> >>> They can still edit a wiki page from their fork of scikit learn I think. >>> So I'd suggest doing that and mailing to this thread, the link to their >>> proposal... >>> >>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: >>> >>>> In past years students made a page on the wiki with their proposal; >>>> this isn't possible anymore due to GitHub permissions. Perhaps an >>>> alternative method for getting feedback should be suggested on the >>>> introduction page? >>>> >>>> Nelson Liu >>>> >>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >>>> gael.varoquaux at normalesup.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Students have been inquiring about the GSoC (Google Summer of Code) >>>>> with >>>>> scikit-learn, and the core team has been quite silent about team. >>>>> >>>>> I am happy to announce that we will be taking part in the scikit-learn >>>>> again. The reason that we decided to do this is to give a chance to the >>>>> young, talented, and motivated students. >>>>> >>>>> Importantly, our most limiting resource is the time of our experienced >>>>> developers. This is clearly visible from the number of pending pull >>>>> requests. Hence, we need students to be very able and independent. This >>>>> of course means that they will be getting supervision from mentors. >>>>> Such >>>>> supervision is crucial for moving forward with a good project, that >>>>> delivers mergeable code. However, we will need the students to be very >>>>> good at interacting efficiently with the mentors. Also, I should stress >>>>> that we will be able to take only a very few numbers of students. >>>>> >>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We >>>>> have >>>>> set up a wiki page which summarizes the experiences from last year and >>>>> the ideas for this year: >>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum >>>>> mer-of-code-(GSOC)-2017 >>>>> >>>>> >>>>> Interested students should declare their interest on the mailing list, >>>>> and discuss with possible mentors here. Factors of success will be >>>>> >>>>> * careful work on a good proposal, that takes on of the ideas on the >>>>> wiki >>>>> but breaks it down in a realistic plan with multiple steps and shows >>>>> a >>>>> good understanding of the problem. >>>>> >>>>> * demonstration of the required skillset via successful pull requests >>>>> in >>>>> scikit-learn. >>>>> >>>>> Cheers, >>>>> >>>>> Ga?l >>>>> >>>>> >>>>> -- >>>>> Gael Varoquaux >>>>> Researcher, INRIA Parietal >>>>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>>> Phone: ++ 33-1-69-08-79-68 >>>>> http://gael-varoquaux.info >>>>> >>>>> http://twitter.com/GaelVaroquaux >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py >> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT >> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg >> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn >> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e= >> >> > > > -- > Regards, > Pradeep Thalasta > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail. python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m= SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDM Xx9RJj0JlqrKk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Mon Feb 27 17:47:02 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Mon, 27 Feb 2017 17:47:02 -0500 Subject: [scikit-learn] Control over the inner loop in GridSearchCV In-Reply-To: References: Message-ID: Hi, Ludovico, my bet is that there is an issue with the format of the object that you pass to the `cv` param of the GridSearchCV. What you need is e.g., " ? An iterable yielding train, test splits.? Or more specifically, say you have a generator, my_gen, that is yielding these splits, the way the indices whould be organized would be: list(my_gen)[0][0] # stores an array of indices used as training fold in the 1st round # e.g., sth like np.array([ 0, 1, 2, 3, 4, 5, 6, ?]) list(my_gen)[0][1] # stores an array of indices used as test fold in the 1st round # e.g., sth like np.array([ 102, 103, 104, 105, 106, 107, 108, ?]) list(my_gen)[1][0] # stores an array of indices used as training fold in the 2nd round my_gen[1][1] # stores an array of indices used as test fold in the 2nd round list(my_gen)[2][0] # stores an array of indices used as training fold in the 3rd round list(my_gen)[2][1] # stores an array of indices used as test fold in the 3rd round Hope that helps. Best, Sebastian > The following did not work. This is what we get --> ValueError: too many values to unpack > On Feb 27, 2017, at 5:13 PM, Ludovico Coletta wrote: > > Dear Sebastian, > > thank you for the quick answer. > > The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1 > > I saw that I wrote something wrong in previous email. Your solution is indeed correct if we leave Scikit decide how to manage the inner loop. This is what we did at the beginning. By doing so, we noticed that the classifier's perfomance decrease (in comparison to a non-optimised classifier). We would like to control the inner split and we need to store the metrics for each fold > > The way we obtained the indices for the optimization, train and test phase is the equivalent of something like that: > > rs = ShuffleSplit(n_splits=9, test_size=.25,random_state=42) > indices_for_each_cv = list(rs.split(data[0:11])) > > Maybe I can make myself clearer if I write what we would like to achieve for the first cross validation fold (I acknowledge that the previous email was quite a mess, sorry). Outer loop: 48 for training, 20 for testing. Of the 48 training subjects, we would like to use 42 for optimization, 6 for testing the parameters. We got the indices so that we match the different scanners even in the optimization phase, but we are not able to pass them to GridSearch object. > > The following did not work. This is what we get --> ValueError: too many values to unpack > > ii = 0 > > while ii < len(cv_final): > # fit and predict > > clf = GridSearchCV( > pipeline, > param_grid=param_grid, > verbose=1, > cv = cv_final_nested[ii], # how to split the 48 train subjects for the optimization > scoring='roc_auc', > n_jobs= -1) > > clf.fit(data[cv_final[ii][0]], y[cv_final[ii][0]]) # the train data of the outer loop for the first (i.e. the 48 subjects) > predictions.append(clf.predict(data[cv_final[ii][1]])) # Predict the 20 subjects left out for test in the outer loop > > ii = ii + 1 > > This however works and should be (more or less) what we would like to achieve with the above loop. However, extracting the best parameters for each fold in order to predict the left out data seems impossible or very laborious. > > clf = GridSearchCV( > pipeline, > > param_grid=param_grid, > verbose=1, > cv = cv_final_nested, > scoring='roc_auc', > n_jobs= -1) > > clf.fit(data,y) > > > Any hint on how to solve this problem would be really appreciated. > > Best > Ludovico > > > > > Da: scikit-learn per conto di scikit-learn-request at python.org > Inviato: luned? 27 febbraio 2017 17.27 > A: scikit-learn at python.org > Oggetto: scikit-learn Digest, Vol 11, Issue 29 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. GSoC 2017 (Gael Varoquaux) > 2. Control over the inner loop in GridSearchCV (Ludovico Coletta) > 3. Re: Control over the inner loop in GridSearchCV > (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 27 Feb 2017 11:58:35 +0100 > From: Gael Varoquaux > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] GSoC 2017 > Message-ID: <20170227105835.GC2041043 at phare.normalesup.org> > Content-Type: text/plain; charset=iso-8859-1 > > Hi, > > Students have been inquiring about the GSoC (Google Summer of Code) with > scikit-learn, and the core team has been quite silent about team. > > I am happy to announce that we will be taking part in the scikit-learn > again. The reason that we decided to do this is to give a chance to the > young, talented, and motivated students. > > Importantly, our most limiting resource is the time of our experienced > developers. This is clearly visible from the number of pending pull > requests. Hence, we need students to be very able and independent. This > of course means that they will be getting supervision from mentors. Such > supervision is crucial for moving forward with a good project, that > delivers mergeable code. However, we will need the students to be very > good at interacting efficiently with the mentors. Also, I should stress > that we will be able to take only a very few numbers of students. > > With that said, let me introduce the 2017 GSoC for scikit-learn. We have > set up a wiki page which summarizes the experiences from last year and > the ideas for this year: > https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017 > Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub > github.com > scikit-learn: machine learning in Python > > > > Interested students should declare their interest on the mailing list, > and discuss with possible mentors here. Factors of success will be > > * careful work on a good proposal, that takes on of the ideas on the wiki > but breaks it down in a realistic plan with multiple steps and shows a > good understanding of the problem. > > * demonstration of the required skillset via successful pull requests in > scikit-learn. > > Cheers, > > Ga?l > > > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > Gael Varoquaux (@GaelVaroquaux) | Twitter > twitter.com > The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France > > Ga?l Varoquaux: computer / data / brain science > gael-varoquaux.info > Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights > > > > > ------------------------------ > > Message: 2 > Date: Mon, 27 Feb 2017 14:27:59 +0000 > From: Ludovico Coletta > To: "scikit-learn at python.org" > Subject: [scikit-learn] Control over the inner loop in GridSearchCV > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Dear Scikit experts, > > > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. > > > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. > > > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: > > > pipeline = Pipeline([('scl', StandardScaler()), > ('sel', RFE(estimator,step=0.2)), > ('clf', SVC(probability=True, random_state=42))]) > > > param_grid = [{'sel__n_features_to_select':[22,15,10,2], > 'clf__C': np.logspace(-3, 5, 100), > 'clf__kernel':['linear']}] > > clf = GridSearchCV(pipeline, > param_grid=param_grid, > verbose=1, > scoring='roc_auc', > n_jobs= -1) > > # cv_final is the custom cv for the outer loop (9 folds) > > ii = 0 > > while ii < len(cv_final): > # fit and predict > > clf.fit(data[?]], y[[?]]) > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data > ii = ii + 1 > > > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > > Two questions: > > 1) Is there any workaround to avoid the split when clf is called without a cv argument? > > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly > > > Thank your for your time and sorry for the long text > > Ludovico > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Mon, 27 Feb 2017 11:27:24 -0500 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Control over the inner loop in > GridSearchCV > Message-ID: > Content-Type: text/plain; charset=utf-8 > > Hi, Ludovico, > what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g., > > skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1) > for outer_train_idx, outer_valid_idx in skfold: > ? > gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx]) > > > > > On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > > Two questions: > > Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.: > > outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1) > > for name, gs_est in sorted(gridcvs.items()): > nested_score = cross_val_score(gs_est, > X=X_train, > y=y_train, > cv=outer_cv, > n_jobs=1) > > > Best, > Sebastian > > > On Feb 27, 2017, at 9:27 AM, Ludovico Coletta wrote: > > > > Dear Scikit experts, > > > > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. > > > > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. > > > > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: > > > > pipeline = Pipeline([('scl', StandardScaler()), > > ('sel', RFE(estimator,step=0.2)), > > ('clf', SVC(probability=True, random_state=42))]) > > > > > > param_grid = [{'sel__n_features_to_select':[22,15,10,2], > > 'clf__C': np.logspace(-3, 5, 100), > > 'clf__kernel':['linear']}] > > > > clf = GridSearchCV(pipeline, > > param_grid=param_grid, > > verbose=1, > > scoring='roc_auc', > > n_jobs= -1) > > > > # cv_final is the custom cv for the outer loop (9 folds) > > > > ii = 0 > > > > while ii < len(cv_final): > > # fit and predict > > > > clf.fit(data[?]], y[[?]]) > > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data > > ii = ii + 1 > > > > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > > Two questions: > > 1) Is there any workaround to avoid the split when clf is called without a cv argument? > > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly > > > > Thank your for your time and sorry for the long text > > Ludovico > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > ------------------------------ > > End of scikit-learn Digest, Vol 11, Issue 29 > ******************************************** > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From ludo25_90 at hotmail.com Mon Feb 27 18:56:59 2017 From: ludo25_90 at hotmail.com (Ludovico Coletta) Date: Mon, 27 Feb 2017 23:56:59 +0000 Subject: [scikit-learn] R: scikit-learn Digest, Vol 11, Issue 32 In-Reply-To: References: Message-ID: Dear Gael, This will probably be the case here, but we would like to exclude the scanner-factor from the possible explanations. We are still lucky that we are not in situation where the number of features >> number of samples. Best Ludovico -------- Messaggio originale -------- Da: scikit-learn-request at python.org Data: 27/02/17 23:49 (GMT+01:00) A: scikit-learn at python.org Oggetto: scikit-learn Digest, Vol 11, Issue 32 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: scikit-learn Digest, Vol 11, Issue 29 (Gael Varoquaux) 2. Re: GSoC 2017 (Joel Nothman) 3. Re: GSoC 2017 (Pradeep Thalasta) ---------------------------------------------------------------------- Message: 1 Date: Mon, 27 Feb 2017 23:19:33 +0100 From: Gael Varoquaux To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29 Message-ID: <20170227221933.GC2369856 at phare.normalesup.org> Content-Type: text/plain; charset=iso-8859-1 On Mon, Feb 27, 2017 at 10:13:04PM +0000, Ludovico Coletta wrote: > The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1 > I saw that I wrote something wrong in previous email. Your solution is indeed > correct if we leave Scikit decide how to manage the inner loop. This is what we > did at the beginning. By doing so, we noticed that the classifier's perfomance > decrease (in comparison to a non-optimised classifier). With 68 samples, it is not that surprising the model-selection with cross-validation is not able to select a good model. We found the same problem in brain imaging data [1], and it's an intrinsic problem due to small sample sizes: cross-validation is just not very accurate in these settings. Ga?l [1] https://arxiv.org/abs/1606.05201 ------------------------------ Message: 2 Date: Tue, 28 Feb 2017 09:34:43 +1100 From: Joel Nothman To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] GSoC 2017 Message-ID: Content-Type: text/plain; charset="utf-8" Hi Pradeep, we would usually only accept candidates who have shown their proficiency and understanding of our package and processes by making some contributions prior to this stage. you are certainly welcome to aim for GSoC 2018 by beginning to develop your familiarity and rapport now. cheers, Joel On 28 Feb 2017 7:01 am, "Pradeep Thalasta" wrote: > Hi, > I'm new to open source contribution. Can i take part in GSoc as well? > > > On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V wrote: > >> Or simply a public gist and importantly the link mailed here would do I >> think... >> >> On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: >> >>> They can still edit a wiki page from their fork of scikit learn I think. >>> So I'd suggest doing that and mailing to this thread, the link to their >>> proposal... >>> >>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: >>> >>>> In past years students made a page on the wiki with their proposal; >>>> this isn't possible anymore due to GitHub permissions. Perhaps an >>>> alternative method for getting feedback should be suggested on the >>>> introduction page? >>>> >>>> Nelson Liu >>>> >>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >>>> gael.varoquaux at normalesup.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Students have been inquiring about the GSoC (Google Summer of Code) >>>>> with >>>>> scikit-learn, and the core team has been quite silent about team. >>>>> >>>>> I am happy to announce that we will be taking part in the scikit-learn >>>>> again. The reason that we decided to do this is to give a chance to the >>>>> young, talented, and motivated students. >>>>> >>>>> Importantly, our most limiting resource is the time of our experienced >>>>> developers. This is clearly visible from the number of pending pull >>>>> requests. Hence, we need students to be very able and independent. This >>>>> of course means that they will be getting supervision from mentors. >>>>> Such >>>>> supervision is crucial for moving forward with a good project, that >>>>> delivers mergeable code. However, we will need the students to be very >>>>> good at interacting efficiently with the mentors. Also, I should stress >>>>> that we will be able to take only a very few numbers of students. >>>>> >>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We >>>>> have >>>>> set up a wiki page which summarizes the experiences from last year and >>>>> the ideas for this year: >>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum >>>>> mer-of-code-(GSOC)-2017 >>>>> >>>>> >>>>> Interested students should declare their interest on the mailing list, >>>>> and discuss with possible mentors here. Factors of success will be >>>>> >>>>> * careful work on a good proposal, that takes on of the ideas on the >>>>> wiki >>>>> but breaks it down in a realistic plan with multiple steps and shows >>>>> a >>>>> good understanding of the problem. >>>>> >>>>> * demonstration of the required skillset via successful pull requests >>>>> in >>>>> scikit-learn. >>>>> >>>>> Cheers, >>>>> >>>>> Ga?l >>>>> >>>>> >>>>> -- >>>>> Gael Varoquaux >>>>> Researcher, INRIA Parietal >>>>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>>> Phone: ++ 33-1-69-08-79-68 >>>>> http://gael-varoquaux.info >>>>> >>>>> http://twitter.com/GaelVaroquaux >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py >> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT >> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg >> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn >> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e= >> >> > > > -- > Regards, > Pradeep Thalasta > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 27 Feb 2017 14:46:36 -0800 From: Pradeep Thalasta To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] GSoC 2017 Message-ID: Content-Type: text/plain; charset="utf-8" Thanks Joel, will start with the contribution soon. On 27 Feb 2017 2:35 pm, "Joel Nothman" wrote: Hi Pradeep, we would usually only accept candidates who have shown their proficiency and understanding of our package and processes by making some contributions prior to this stage. you are certainly welcome to aim for GSoC 2018 by beginning to develop your familiarity and rapport now. cheers, Joel On 28 Feb 2017 7:01 am, "Pradeep Thalasta" wrote: > Hi, > I'm new to open source contribution. Can i take part in GSoc as well? > > > On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V wrote: > >> Or simply a public gist and importantly the link mailed here would do I >> think... >> >> On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: >> >>> They can still edit a wiki page from their fork of scikit learn I think. >>> So I'd suggest doing that and mailing to this thread, the link to their >>> proposal... >>> >>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: >>> >>>> In past years students made a page on the wiki with their proposal; >>>> this isn't possible anymore due to GitHub permissions. Perhaps an >>>> alternative method for getting feedback should be suggested on the >>>> introduction page? >>>> >>>> Nelson Liu >>>> >>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >>>> gael.varoquaux at normalesup.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Students have been inquiring about the GSoC (Google Summer of Code) >>>>> with >>>>> scikit-learn, and the core team has been quite silent about team. >>>>> >>>>> I am happy to announce that we will be taking part in the scikit-learn >>>>> again. The reason that we decided to do this is to give a chance to the >>>>> young, talented, and motivated students. >>>>> >>>>> Importantly, our most limiting resource is the time of our experienced >>>>> developers. This is clearly visible from the number of pending pull >>>>> requests. Hence, we need students to be very able and independent. This >>>>> of course means that they will be getting supervision from mentors. >>>>> Such >>>>> supervision is crucial for moving forward with a good project, that >>>>> delivers mergeable code. However, we will need the students to be very >>>>> good at interacting efficiently with the mentors. Also, I should stress >>>>> that we will be able to take only a very few numbers of students. >>>>> >>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We >>>>> have >>>>> set up a wiki page which summarizes the experiences from last year and >>>>> the ideas for this year: >>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum >>>>> mer-of-code-(GSOC)-2017 >>>>> >>>>> >>>>> Interested students should declare their interest on the mailing list, >>>>> and discuss with possible mentors here. Factors of success will be >>>>> >>>>> * careful work on a good proposal, that takes on of the ideas on the >>>>> wiki >>>>> but breaks it down in a realistic plan with multiple steps and shows >>>>> a >>>>> good understanding of the problem. >>>>> >>>>> * demonstration of the required skillset via successful pull requests >>>>> in >>>>> scikit-learn. >>>>> >>>>> Cheers, >>>>> >>>>> Ga?l >>>>> >>>>> >>>>> -- >>>>> Gael Varoquaux >>>>> Researcher, INRIA Parietal >>>>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>>> Phone: ++ 33-1-69-08-79-68 >>>>> http://gael-varoquaux.info >>>>> >>>>> http://twitter.com/GaelVaroquaux >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py >> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT >> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg >> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn >> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e= >> >> > > > -- > Regards, > Pradeep Thalasta > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail. python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m= SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDM Xx9RJj0JlqrKk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 11, Issue 32 ******************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From ludo25_90 at hotmail.com Mon Feb 27 19:28:19 2017 From: ludo25_90 at hotmail.com (Ludovico Coletta) Date: Tue, 28 Feb 2017 00:28:19 +0000 Subject: [scikit-learn] scikit-learn Digest, Vol 11, Issue 33 In-Reply-To: References: Message-ID: Dear Sebastian, this was exactly what we did but is not working. cv_final[0][0] and cv_final[0][1] hold the training and test indices for the first fold (outer loop), while cv_final_nested[0][0] and cv_final_nested[0][1] hold the indices for the parameter optimization for the first fold (inner loop, training and test respectively). You are probably right, there must be a (I hope so) little error somewhere. I will try again in the next days. Thank you for your time Ludovico ________________________________ Da: scikit-learn per conto di scikit-learn-request at python.org Inviato: marted? 28 febbraio 2017 00.57 A: scikit-learn at python.org Oggetto: scikit-learn Digest, Vol 11, Issue 33 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: Control over the inner loop in GridSearchCV (Sebastian Raschka) 2. R: scikit-learn Digest, Vol 11, Issue 32 (Ludovico Coletta) ---------------------------------------------------------------------- Message: 1 Date: Mon, 27 Feb 2017 17:47:02 -0500 From: Sebastian Raschka To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Control over the inner loop in GridSearchCV Message-ID: Content-Type: text/plain; charset=utf-8 Hi, Ludovico, my bet is that there is an issue with the format of the object that you pass to the `cv` param of the GridSearchCV. What you need is e.g., " ? An iterable yielding train, test splits.? Or more specifically, say you have a generator, my_gen, that is yielding these splits, the way the indices whould be organized would be: list(my_gen)[0][0] # stores an array of indices used as training fold in the 1st round # e.g., sth like np.array([ 0, 1, 2, 3, 4, 5, 6, ?]) list(my_gen)[0][1] # stores an array of indices used as test fold in the 1st round # e.g., sth like np.array([ 102, 103, 104, 105, 106, 107, 108, ?]) list(my_gen)[1][0] # stores an array of indices used as training fold in the 2nd round my_gen[1][1] # stores an array of indices used as test fold in the 2nd round list(my_gen)[2][0] # stores an array of indices used as training fold in the 3rd round list(my_gen)[2][1] # stores an array of indices used as test fold in the 3rd round Hope that helps. Best, Sebastian > The following did not work. This is what we get --> ValueError: too many values to unpack > On Feb 27, 2017, at 5:13 PM, Ludovico Coletta wrote: > > Dear Sebastian, > > thank you for the quick answer. > > The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1 > > I saw that I wrote something wrong in previous email. Your solution is indeed correct if we leave Scikit decide how to manage the inner loop. This is what we did at the beginning. By doing so, we noticed that the classifier's perfomance decrease (in comparison to a non-optimised classifier). We would like to control the inner split and we need to store the metrics for each fold > > The way we obtained the indices for the optimization, train and test phase is the equivalent of something like that: > > rs = ShuffleSplit(n_splits=9, test_size=.25,random_state=42) > indices_for_each_cv = list(rs.split(data[0:11])) > > Maybe I can make myself clearer if I write what we would like to achieve for the first cross validation fold (I acknowledge that the previous email was quite a mess, sorry). Outer loop: 48 for training, 20 for testing. Of the 48 training subjects, we would like to use 42 for optimization, 6 for testing the parameters. We got the indices so that we match the different scanners even in the optimization phase, but we are not able to pass them to GridSearch object. > > The following did not work. This is what we get --> ValueError: too many values to unpack > > ii = 0 > > while ii < len(cv_final): > # fit and predict > > clf = GridSearchCV( > pipeline, > param_grid=param_grid, > verbose=1, > cv = cv_final_nested[ii], # how to split the 48 train subjects for the optimization > scoring='roc_auc', > n_jobs= -1) > > clf.fit(data[cv_final[ii][0]], y[cv_final[ii][0]]) # the train data of the outer loop for the first (i.e. the 48 subjects) > predictions.append(clf.predict(data[cv_final[ii][1]])) # Predict the 20 subjects left out for test in the outer loop > > ii = ii + 1 > > This however works and should be (more or less) what we would like to achieve with the above loop. However, extracting the best parameters for each fold in order to predict the left out data seems impossible or very laborious. > > clf = GridSearchCV( > pipeline, > > param_grid=param_grid, > verbose=1, > cv = cv_final_nested, > scoring='roc_auc', > n_jobs= -1) > > clf.fit(data,y) > > > Any hint on how to solve this problem would be really appreciated. > > Best > Ludovico > > > > > Da: scikit-learn per conto di scikit-learn-request at python.org > Inviato: luned? 27 febbraio 2017 17.27 > A: scikit-learn at python.org > Oggetto: scikit-learn Digest, Vol 11, Issue 29 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. GSoC 2017 (Gael Varoquaux) > 2. Control over the inner loop in GridSearchCV (Ludovico Coletta) > 3. Re: Control over the inner loop in GridSearchCV > (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 27 Feb 2017 11:58:35 +0100 > From: Gael Varoquaux > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] GSoC 2017 > Message-ID: <20170227105835.GC2041043 at phare.normalesup.org> > Content-Type: text/plain; charset=iso-8859-1 > > Hi, > > Students have been inquiring about the GSoC (Google Summer of Code) with > scikit-learn, and the core team has been quite silent about team. > > I am happy to announce that we will be taking part in the scikit-learn > again. The reason that we decided to do this is to give a chance to the > young, talented, and motivated students. > > Importantly, our most limiting resource is the time of our experienced > developers. This is clearly visible from the number of pending pull > requests. Hence, we need students to be very able and independent. This > of course means that they will be getting supervision from mentors. Such > supervision is crucial for moving forward with a good project, that > delivers mergeable code. However, we will need the students to be very > good at interacting efficiently with the mentors. Also, I should stress > that we will be able to take only a very few numbers of students. > > With that said, let me introduce the 2017 GSoC for scikit-learn. We have > set up a wiki page which summarizes the experiences from last year and > the ideas for this year: > https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017 Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub github.com scikit-learn: machine learning in Python > Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub > github.com > scikit-learn: machine learning in Python > > > > Interested students should declare their interest on the mailing list, > and discuss with possible mentors here. Factors of success will be > > * careful work on a good proposal, that takes on of the ideas on the wiki > but breaks it down in a realistic plan with multiple steps and shows a > good understanding of the problem. > > * demonstration of the required skillset via successful pull requests in > scikit-learn. > > Cheers, > > Ga?l > > > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux Gael Varoquaux (@GaelVaroquaux) | Twitter twitter.com The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France Ga?l Varoquaux: computer / data / brain science gael-varoquaux.info Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights > Gael Varoquaux (@GaelVaroquaux) | Twitter > twitter.com > The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France > > Ga?l Varoquaux: computer / data / brain science > gael-varoquaux.info > Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights > > > > > ------------------------------ > > Message: 2 > Date: Mon, 27 Feb 2017 14:27:59 +0000 > From: Ludovico Coletta > To: "scikit-learn at python.org" > Subject: [scikit-learn] Control over the inner loop in GridSearchCV > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Dear Scikit experts, > > > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. > > > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. > > > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: > > > pipeline = Pipeline([('scl', StandardScaler()), > ('sel', RFE(estimator,step=0.2)), > ('clf', SVC(probability=True, random_state=42))]) > > > param_grid = [{'sel__n_features_to_select':[22,15,10,2], > 'clf__C': np.logspace(-3, 5, 100), > 'clf__kernel':['linear']}] > > clf = GridSearchCV(pipeline, > param_grid=param_grid, > verbose=1, > scoring='roc_auc', > n_jobs= -1) > > # cv_final is the custom cv for the outer loop (9 folds) > > ii = 0 > > while ii < len(cv_final): > # fit and predict > > clf.fit(data[?]], y[[?]]) > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data > ii = ii + 1 > > > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > > Two questions: > > 1) Is there any workaround to avoid the split when clf is called without a cv argument? > > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly > > > Thank your for your time and sorry for the long text > > Ludovico > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Mon, 27 Feb 2017 11:27:24 -0500 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Control over the inner loop in > GridSearchCV > Message-ID: > Content-Type: text/plain; charset=utf-8 > > Hi, Ludovico, > what format (shape) is data in? Are these the arrays from a Kfold iterator? In this case, the ?question marks? in your code snippet should simply be the train and validation subset indices generated by the KFold generator. E.g., > > skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=1) > for outer_train_idx, outer_valid_idx in skfold: > ? > gridsearch_object.fit(X_train[outer_train_idx], y_train[outer_train_idx]) > > > > > On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > > Two questions: > > Are you using an version older than scikit-learn 0.18? Techically, the GridSearchCV, RandomizedSearchCV, cross_val_score ? should all support iterables that of train_ and test_indices e.g.: > > outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1) > > for name, gs_est in sorted(gridcvs.items()): > nested_score = cross_val_score(gs_est, > X=X_train, > y=y_train, > cv=outer_cv, > n_jobs=1) > > > Best, > Sebastian > > > On Feb 27, 2017, at 9:27 AM, Ludovico Coletta wrote: > > > > Dear Scikit experts, > > > > we am stucked with GridSearchCV. Nobody else was able/wanted to help us, we hope you will. > > > > We are analysing neuroimaging data coming from 3 different MRI scanners, where for each scanner we have a healthy group and a disease group. We would like to merge the data from the 3 different scanners in order to classify the healthy subjects from the one who have the disease. > > > > The problem is that we can almost perfectly classify the subjects according to the scanner (e.g. the healthy subjects from scanner 1 and scanner 2). We are using a custom cross validation schema to account for the different scanners: when no hyper-parameter (SVM) optimization is performed, everything is straightforward. Problems arise when we would like to perform hyperparameter optimization: in this case we need to balance for the different scanner in the optimization phase as well. We also found a custom cv schema for this, but we are not able to pass it to GridSearchCV object. We would like to get something like the following: > > > > pipeline = Pipeline([('scl', StandardScaler()), > > ('sel', RFE(estimator,step=0.2)), > > ('clf', SVC(probability=True, random_state=42))]) > > > > > > param_grid = [{'sel__n_features_to_select':[22,15,10,2], > > 'clf__C': np.logspace(-3, 5, 100), > > 'clf__kernel':['linear']}] > > > > clf = GridSearchCV(pipeline, > > param_grid=param_grid, > > verbose=1, > > scoring='roc_auc', > > n_jobs= -1) > > > > # cv_final is the custom cv for the outer loop (9 folds) > > > > ii = 0 > > > > while ii < len(cv_final): > > # fit and predict > > > > clf.fit(data[?]], y[[?]]) > > predictions.append(clf.predict(data[cv_final[ii][1]])) # outer test data > > ii = ii + 1 > > > > We tried almost everything. When we define clf in the loop, we pass the -ith cv_nested as cv argument, and we fit it on the training data of the -ith custom_cv fold, we get an "Too many values to unpack" error. On the other end, when we try to pass the nested -ith cv fold as cv argument for clf, and we call fit on the same cv_nested fold, we get an "Index out of bound" error. > > Two questions: > > 1) Is there any workaround to avoid the split when clf is called without a cv argument? > > 2) We suppose that for hyperparameter optimization the test data is removed from the dataset and a new dataset is created. Is this true? In this case we only have to adjust the indices accordingly > > > > Thank your for your time and sorry for the long text > > Ludovico > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > ------------------------------ > > End of scikit-learn Digest, Vol 11, Issue 29 > ******************************************** > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... ------------------------------ Message: 2 Date: Mon, 27 Feb 2017 23:56:59 +0000 From: Ludovico Coletta To: "scikit-learn at python.org" Subject: [scikit-learn] R: scikit-learn Digest, Vol 11, Issue 32 Message-ID: Content-Type: text/plain; charset="us-ascii" Dear Gael, This will probably be the case here, but we would like to exclude the scanner-factor from the possible explanations. We are still lucky that we are not in situation where the number of features >> number of samples. Best Ludovico -------- Messaggio originale -------- Da: scikit-learn-request at python.org Data: 27/02/17 23:49 (GMT+01:00) A: scikit-learn at python.org Oggetto: scikit-learn Digest, Vol 11, Issue 32 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: scikit-learn Digest, Vol 11, Issue 29 (Gael Varoquaux) 2. Re: GSoC 2017 (Joel Nothman) 3. Re: GSoC 2017 (Pradeep Thalasta) ---------------------------------------------------------------------- Message: 1 Date: Mon, 27 Feb 2017 23:19:33 +0100 From: Gael Varoquaux To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] scikit-learn Digest, Vol 11, Issue 29 Message-ID: <20170227221933.GC2369856 at phare.normalesup.org> Content-Type: text/plain; charset=iso-8859-1 On Mon, Feb 27, 2017 at 10:13:04PM +0000, Ludovico Coletta wrote: > The data is stored in a numpy array (shape: 68, 24). We are using scikit 18.1 > I saw that I wrote something wrong in previous email. Your solution is indeed > correct if we leave Scikit decide how to manage the inner loop. This is what we > did at the beginning. By doing so, we noticed that the classifier's perfomance > decrease (in comparison to a non-optimised classifier). With 68 samples, it is not that surprising the model-selection with cross-validation is not able to select a good model. We found the same problem in brain imaging data [1], and it's an intrinsic problem due to small sample sizes: cross-validation is just not very accurate in these settings. Ga?l [1] https://arxiv.org/abs/1606.05201 [1606.05201] Assessing and tuning brain decoders: cross ... arxiv.org Submission history From: Gael Varoquaux Thu, 16 Jun 2016 14:29:28 GMT (785kb,D) [v2] Mon, 7 Nov 2016 15:40:46 GMT (692kb,D) ------------------------------ Message: 2 Date: Tue, 28 Feb 2017 09:34:43 +1100 From: Joel Nothman To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] GSoC 2017 Message-ID: Content-Type: text/plain; charset="utf-8" Hi Pradeep, we would usually only accept candidates who have shown their proficiency and understanding of our package and processes by making some contributions prior to this stage. you are certainly welcome to aim for GSoC 2018 by beginning to develop your familiarity and rapport now. cheers, Joel On 28 Feb 2017 7:01 am, "Pradeep Thalasta" wrote: > Hi, > I'm new to open source contribution. Can i take part in GSoc as well? > > > On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V wrote: > >> Or simply a public gist and importantly the link mailed here would do I >> think... >> >> On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: >> >>> They can still edit a wiki page from their fork of scikit learn I think. >>> So I'd suggest doing that and mailing to this thread, the link to their >>> proposal... >>> >>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: >>> >>>> In past years students made a page on the wiki with their proposal; >>>> this isn't possible anymore due to GitHub permissions. Perhaps an >>>> alternative method for getting feedback should be suggested on the >>>> introduction page? >>>> >>>> Nelson Liu >>>> >>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >>>> gael.varoquaux at normalesup.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Students have been inquiring about the GSoC (Google Summer of Code) >>>>> with >>>>> scikit-learn, and the core team has been quite silent about team. >>>>> >>>>> I am happy to announce that we will be taking part in the scikit-learn >>>>> again. The reason that we decided to do this is to give a chance to the >>>>> young, talented, and motivated students. >>>>> >>>>> Importantly, our most limiting resource is the time of our experienced >>>>> developers. This is clearly visible from the number of pending pull >>>>> requests. Hence, we need students to be very able and independent. This >>>>> of course means that they will be getting supervision from mentors. >>>>> Such >>>>> supervision is crucial for moving forward with a good project, that >>>>> delivers mergeable code. However, we will need the students to be very >>>>> good at interacting efficiently with the mentors. Also, I should stress >>>>> that we will be able to take only a very few numbers of students. >>>>> >>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We >>>>> have >>>>> set up a wiki page which summarizes the experiences from last year and >>>>> the ideas for this year: >>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum Home ? scikit-learn/scikit-learn Wiki ? GitHub github.com scikit-learn: machine learning in Python >>>>> mer-of-code-(GSOC)-2017 >>>>> Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub urldefense.proofpoint.com scikit-learn: machine learning in Python >>>>> >>>>> Interested students should declare their interest on the mailing list, >>>>> and discuss with possible mentors here. Factors of success will be >>>>> >>>>> * careful work on a good proposal, that takes on of the ideas on the >>>>> wiki >>>>> but breaks it down in a realistic plan with multiple steps and shows >>>>> a >>>>> good understanding of the problem. >>>>> >>>>> * demonstration of the required skillset via successful pull requests >>>>> in >>>>> scikit-learn. >>>>> >>>>> Cheers, >>>>> >>>>> Ga?l >>>>> >>>>> >>>>> -- >>>>> Gael Varoquaux >>>>> Researcher, INRIA Parietal >>>>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>>> Phone: ++ 33-1-69-08-79-68 >>>>> http://gael-varoquaux.info Ga?l Varoquaux: computer / data / brain science gael-varoquaux.info Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights >>>>> >>>>> http://twitter.com/GaelVaroquaux Gael Varoquaux (@GaelVaroquaux) | Twitter twitter.com The latest Tweets from Gael Varoquaux (@GaelVaroquaux). Researcher and geek: ?Brain, Data, & Computational science ?#python #pydata #sklearn ?Machine learning for fMRI ?Photography on @artgael. Paris, France >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... >>>> >>>> >>>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py >> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT >> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg >> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn >> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e= >> >> > > > -- > Regards, > Pradeep Thalasta > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 27 Feb 2017 14:46:36 -0800 From: Pradeep Thalasta To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] GSoC 2017 Message-ID: Content-Type: text/plain; charset="utf-8" Thanks Joel, will start with the contribution soon. On 27 Feb 2017 2:35 pm, "Joel Nothman" wrote: Hi Pradeep, we would usually only accept candidates who have shown their proficiency and understanding of our package and processes by making some contributions prior to this stage. you are certainly welcome to aim for GSoC 2018 by beginning to develop your familiarity and rapport now. cheers, Joel On 28 Feb 2017 7:01 am, "Pradeep Thalasta" wrote: > Hi, > I'm new to open source contribution. Can i take part in GSoc as well? > > > On Mon, Feb 27, 2017 at 11:29 AM, Raghav R V wrote: > >> Or simply a public gist and importantly the link mailed here would do I >> think... >> >> On 27 Feb 2017 8:28 p.m., "Raghav R V" wrote: >> >>> They can still edit a wiki page from their fork of scikit learn I think. >>> So I'd suggest doing that and mailing to this thread, the link to their >>> proposal... >>> >>> On 27 Feb 2017 6:55 p.m., "Nelson Liu" wrote: >>> >>>> In past years students made a page on the wiki with their proposal; >>>> this isn't possible anymore due to GitHub permissions. Perhaps an >>>> alternative method for getting feedback should be suggested on the >>>> introduction page? >>>> >>>> Nelson Liu >>>> >>>> On Mon, Feb 27, 2017 at 2:58 AM, Gael Varoquaux < >>>> gael.varoquaux at normalesup.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Students have been inquiring about the GSoC (Google Summer of Code) >>>>> with >>>>> scikit-learn, and the core team has been quite silent about team. >>>>> >>>>> I am happy to announce that we will be taking part in the scikit-learn >>>>> again. The reason that we decided to do this is to give a chance to the >>>>> young, talented, and motivated students. >>>>> >>>>> Importantly, our most limiting resource is the time of our experienced >>>>> developers. This is clearly visible from the number of pending pull >>>>> requests. Hence, we need students to be very able and independent. This >>>>> of course means that they will be getting supervision from mentors. >>>>> Such >>>>> supervision is crucial for moving forward with a good project, that >>>>> delivers mergeable code. However, we will need the students to be very >>>>> good at interacting efficiently with the mentors. Also, I should stress >>>>> that we will be able to take only a very few numbers of students. >>>>> >>>>> With that said, let me introduce the 2017 GSoC for scikit-learn. We >>>>> have >>>>> set up a wiki page which summarizes the experiences from last year and >>>>> the ideas for this year: >>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-sum Home ? scikit-learn/scikit-learn Wiki ? GitHub github.com scikit-learn: machine learning in Python >>>>> mer-of-code-(GSOC)-2017 >>>>> Google summer of code (GSOC) 2017 ? scikit-learn/scikit-learn Wiki ? GitHub urldefense.proofpoint.com scikit-learn: machine learning in Python >>>>> >>>>> Interested students should declare their interest on the mailing list, >>>>> and discuss with possible mentors here. Factors of success will be >>>>> >>>>> * careful work on a good proposal, that takes on of the ideas on the >>>>> wiki >>>>> but breaks it down in a realistic plan with multiple steps and shows >>>>> a >>>>> good understanding of the problem. >>>>> >>>>> * demonstration of the required skillset via successful pull requests >>>>> in >>>>> scikit-learn. >>>>> >>>>> Cheers, >>>>> >>>>> Ga?l >>>>> >>>>> >>>>> -- >>>>> Gael Varoquaux >>>>> Researcher, INRIA Parietal >>>>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>>> Phone: ++ 33-1-69-08-79-68 >>>>> http://gael-varoquaux.info Ga?l Varoquaux: computer / data / brain science gael-varoquaux.info Ga?l Varoquaux, computer / data / brain science ... Latest posts . misc personnal programming science Our research in 2016: personal scientific highlights >>>>> >>>>> http://twitter.com/GaelVaroquaux >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... >>>> >>>> >>>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.py >> thon.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c=clK7kQUT >> WtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg >> &m=WOCvB_ncbkX6zknItZ8JGw5QvsCBNqh2DCc_AxGKj10&s=2HaUcj6htbn >> tv3V5UTTAgAtZk6luVMnqXA9vEOlfJ_k&e= >> >> > > > -- > Regards, > Pradeep Thalasta > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail. python.org_mailman_listinfo_scikit-2Dlearn&d=DwICAg&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=8wN-jbuYw7VyipS2uLHiQg&m= SdunRabzBJNpIzcL2t6OsNtuhzVKbzr-DI484v3czLY&s=UEKHpAVTnVYDcV77QhqeRIesWqvTDM Xx9RJj0JlqrKk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... ------------------------------ End of scikit-learn Digest, Vol 11, Issue 32 ******************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... ------------------------------ End of scikit-learn Digest, Vol 11, Issue 33 ******************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From rovik05 at gmail.com Mon Feb 27 22:43:05 2017 From: rovik05 at gmail.com (Rohan Koodli) Date: Mon, 27 Feb 2017 19:43:05 -0800 Subject: [scikit-learn] Clustering 4 dimensional data Message-ID: I'm having trouble understanding how to cluster multidimensional data. Specifically, a 4 dimensional array. test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]] from sklearn import mixture gmm = mixture.GMM() gmm.fit(test) The code returns the following error: "Found array with dim 4. GMM expected <= 2." Do I need to change the way my data is formatted? Is there a way of doing clustering on 4 dimensional data? -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Mon Feb 27 22:53:02 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 28 Feb 2017 14:53:02 +1100 Subject: [scikit-learn] Clustering 4 dimensional data In-Reply-To: References: Message-ID: What do your four dimensions mean? Can you reshape your data such that it can be seen as a collection of 1d vectors drawn independently from some distribution? On 28 February 2017 at 14:43, Rohan Koodli wrote: > I'm having trouble understanding how to cluster multidimensional data. > Specifically, a 4 dimensional array. > > > test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]] > > from sklearn import mixture > gmm = mixture.GMM() > gmm.fit(test) > > The code returns the following error: > > "Found array with dim 4. GMM expected <= 2." > > Do I need to change the way my data is formatted? Is there a way of doing > clustering on 4 dimensional data? > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john_ladasky at sbcglobal.net Mon Feb 27 23:06:06 2017 From: john_ladasky at sbcglobal.net (John Ladasky) Date: Mon, 27 Feb 2017 20:06:06 -0800 Subject: [scikit-learn] Clustering 4 dimensional data In-Reply-To: Message-ID: <5e48fccb-4c38-4dde-9235-c0b96126b3e7@email.android.com> An HTML attachment was scrubbed... URL: From dmitrii.ignatov at gmail.com Mon Feb 27 23:35:01 2017 From: dmitrii.ignatov at gmail.com (Dmitry Ignatov) Date: Tue, 28 Feb 2017 07:35:01 +0300 Subject: [scikit-learn] Clustering 4 dimensional data In-Reply-To: References: Message-ID: Sometimes, when you need to find homogeneous subtensors, you can refer to it as multimodal clustering, an extension of biclustering. I cannot see clearly whether this is the case here. 28 ????. 2017 ?. 6:54 ???????????? "Joel Nothman" ???????: What do your four dimensions mean? Can you reshape your data such that it can be seen as a collection of 1d vectors drawn independently from some distribution? On 28 February 2017 at 14:43, Rohan Koodli wrote: > I'm having trouble understanding how to cluster multidimensional data. > Specifically, a 4 dimensional array. > > > test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]] > > from sklearn import mixture > gmm = mixture.GMM() > gmm.fit(test) > > The code returns the following error: > > "Found array with dim 4. GMM expected <= 2." > > Do I need to change the way my data is formatted? Is there a way of doing > clustering on 4 dimensional data? > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Feb 27 23:50:39 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 27 Feb 2017 23:50:39 -0500 Subject: [scikit-learn] Women in Machine Learning and Data Science Sprint next Weekend (also call for help) Message-ID: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com> Hey all. There's gonna be an introductory scikit-learn sprint at NYC on Saturday that a local Women's DS/ML group is organizing with me. I feel like we could do a bit more to improve (gender) diversity in the scipy/pydata space, and so I think this will be cool. If anyone wants to review code on Saturday that would be a great help for people getting started. Also, if anyone wants to help beforehand, making sure there is enough "easy" and "need contributor" issues tagged is important, as well as ensuring that all the tagged issues actually still need contributors. I'll try to do as much of these as I can but my time is limited these days :( Thanks y'all! Andy From jmschreiber91 at gmail.com Mon Feb 27 23:58:26 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Mon, 27 Feb 2017 20:58:26 -0800 Subject: [scikit-learn] Women in Machine Learning and Data Science Sprint next Weekend (also call for help) In-Reply-To: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com> References: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com> Message-ID: I will try to carve out some time Saturday to review PRs. What time is it occuring? On Mon, Feb 27, 2017 at 8:50 PM, Andreas Mueller wrote: > Hey all. > > There's gonna be an introductory scikit-learn sprint at NYC on Saturday > that a local Women's DS/ML group is organizing with me. > I feel like we could do a bit more to improve (gender) diversity in the > scipy/pydata space, and so I think this will be cool. > > If anyone wants to review code on Saturday that would be a great help for > people getting started. > Also, if anyone wants to help beforehand, making sure there is enough > "easy" and "need contributor" issues tagged > is important, as well as ensuring that all the tagged issues actually > still need contributors. > > I'll try to do as much of these as I can but my time is limited these days > :( > > Thanks y'all! > > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amandalmia18 at gmail.com Tue Feb 28 08:06:36 2017 From: amandalmia18 at gmail.com (Aman Dalmia) Date: Tue, 28 Feb 2017 18:36:36 +0530 Subject: [scikit-learn] GSoC, 2017 - Parallel Decision Tree Building Message-ID: Hello everyone, I am a pre-final year student studying Electronics & Communication Engineering at IIT Guwahati. I am a member of Prof. Amit Sethi 's research group where I work on cancer recurrence prediction using deep learning and have also started working with Prof. Ashish Anand , using NLP for genome sequencing. I want to contribute to scikit-learn working on the project 'Parallel Decision Tree Building' for GSoC, 2017. I have been contributing to scikit-learn for the past few weeks working on issues across different modules. Although I am familiar with the tree building algorithms, I have not worked a lot on the tree module of scikit-learn and hence, am I trying to familiarize myself by working on these issues: https://github.com/scikit-learn/scikit-learn/issues/4225 https://github.com/scikit-learn/scikit-learn/issues/6557 Please let me know as to what should be the next steps that I need to follow for building a good proposal. Thank you, Aman Dalmia, Pre-final year student, Electronics & Communication Engineering, IIT Guwahati, +91-8011492025 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dale.T.Smith at macys.com Tue Feb 28 08:08:14 2017 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Tue, 28 Feb 2017 13:08:14 +0000 Subject: [scikit-learn] Clustering 4 dimensional data In-Reply-To: References: Message-ID: Use whitespace and carriage returns to reformat your data. It?s not clear what you are doing. Also, put it into a Pandas dataframe and make a few plots. The Visualization page is very helpful, along with the Seaborn examples. ____________________________________________________________________ Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Rohan Koodli Sent: Monday, February 27, 2017 10:43 PM To: scikit-learn at python.org Subject: [scikit-learn] Clustering 4 dimensional data ? EXT MSG: I'm having trouble understanding how to cluster multidimensional data. Specifically, a 4 dimensional array. test = [[[[3,10],[1,5],[3,18]],[[3,1],[0,0],[0,0]],[[3,3],[1,5],[0,0]]],[[[1,5],[2,7],[0,0]],[[1,7],[0,0],[0,0]],[[0,0],[0,0],[0,0]]]] from sklearn import mixture gmm = mixture.GMM() gmm.fit(test) The code returns the following error: "Found array with dim 4. GMM expected <= 2." Do I need to change the way my data is formatted? Is there a way of doing clustering on 4 dimensional data? * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruchika.work at gmail.com Tue Feb 28 12:37:59 2017 From: ruchika.work at gmail.com (Ruchika Nayyar) Date: Tue, 28 Feb 2017 10:37:59 -0700 Subject: [scikit-learn] Scipy 2017 In-Reply-To: References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com> Message-ID: Hello Will there be a video link ? Thanks, Ruchika ---------------------------------------- Dr Ruchika Nayyar, Post Doctoral Fellow for ATLAS Collaboration University of Arizona Arizona, USA. -------------------------------------------- On Mon, Feb 27, 2017 at 2:20 PM, Alexandre Gramfort < alexandre.gramfort at telecom-paristech.fr> wrote: > Hi Andy, > > I'll be happy to share the stage with you for a tutorial. > > Alex > > > On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller wrote: > > Hey folks. > > Who's coming to scipy this year? > > Any volunteers for tutorials? I'm happy to be part of it but doing 7h by > > myself is a bit much ;) > > > > > > Andy > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfliu at uw.edu Tue Feb 28 12:43:27 2017 From: nfliu at uw.edu (Nelson Liu) Date: Tue, 28 Feb 2017 09:43:27 -0800 Subject: [scikit-learn] Scipy 2017 In-Reply-To: References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com> Message-ID: The conference generally (at least for the last three years) uploads recordings of the tutorials afterwards, e.g. here is part one of the scikit-learn tutorial at Scipy 2016. I would assume that they are doing this again. Nelson Liu On Tue, Feb 28, 2017 at 9:37 AM, Ruchika Nayyar wrote: > Hello > > Will there be a video link ? > > Thanks, > Ruchika > ---------------------------------------- > Dr Ruchika Nayyar, > Post Doctoral Fellow for ATLAS Collaboration > University of Arizona > Arizona, USA. > -------------------------------------------- > > On Mon, Feb 27, 2017 at 2:20 PM, Alexandre Gramfort < > alexandre.gramfort at telecom-paristech.fr> wrote: > >> Hi Andy, >> >> I'll be happy to share the stage with you for a tutorial. >> >> Alex >> >> >> On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller >> wrote: >> > Hey folks. >> > Who's coming to scipy this year? >> > Any volunteers for tutorials? I'm happy to be part of it but doing 7h by >> > myself is a bit much ;) >> > >> > >> > Andy >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Tue Feb 28 14:15:37 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Tue, 28 Feb 2017 11:15:37 -0800 Subject: [scikit-learn] GSoC, 2017 - Parallel Decision Tree Building In-Reply-To: References: Message-ID: Hi Aman I responded to your other email, but I'm not sure if it actually went through. Thanks for your interest in the project, and your current PRs. If you're looking to apply, you should write a gist which follows the format that nelson-liu used here: https://github.com/scikit-learn/scikit-learn/ wiki/GSoC-2016-Proposal:-Addition-of-various-enhancements-to-the-tree- module-by-completing-stalled-pull-requests. The goal of this project is to parallelize the building of single decision trees, likely by parallelizing the task of finding the optimal split at each node. You should put as much detail in as possible for this proposal. As Gael mentioned in the other thread, the limiting factor for GSoC this year is mentor time, and the most successful students will be those who can operate independently. A detailed proposal outlining exactly what needs to be done will go a long way in showing us that you understand the problem and the codebase well enough to set achievable goals for the summer. In addition, we want to ensure that you have the requisite background in python, cython, parallel processing, and tree building required for the project, so you should emphasize those skills and previous work you've done which utilize them. Let me know if you have any further questions, and I look forward to seeing your proposal! Jacob On Tue, Feb 28, 2017 at 5:06 AM, Aman Dalmia wrote: > Hello everyone, > > I am a pre-final year student studying Electronics & Communication > Engineering at IIT Guwahati. I am a member of Prof. Amit Sethi > 's research group where > I work on cancer recurrence prediction using deep learning and have also > started working with Prof. Ashish Anand , > using NLP for genome sequencing. I want to contribute to scikit-learn > working on the project 'Parallel Decision Tree Building' for GSoC, 2017. I > have been contributing to scikit-learn for the past few weeks working on > issues across different modules. Although I am familiar with the tree > building algorithms, I have not worked a lot on the tree module of > scikit-learn and hence, am I trying to familiarize myself by working on > these issues: > > https://github.com/scikit-learn/scikit-learn/issues/4225 > https://github.com/scikit-learn/scikit-learn/issues/6557 > > Please let me know as to what should be the next steps that I need to > follow for building a good proposal. > > Thank you, > Aman Dalmia, > Pre-final year student, > Electronics & Communication Engineering, > IIT Guwahati, > +91-8011492025 <+91%2080114%2092025> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 28 19:28:46 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 28 Feb 2017 19:28:46 -0500 Subject: [scikit-learn] Women in Machine Learning and Data Science Sprint next Weekend (also call for help) In-Reply-To: References: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com> Message-ID: Thanks! It's gonna be 9:30 till 4, but I'd be surprised if there's a lot going on on the issue tracker before 11h with setup etc. (EST that is). Andy On 02/27/2017 11:58 PM, Jacob Schreiber wrote: > I will try to carve out some time Saturday to review PRs. What time is > it occuring? > > On Mon, Feb 27, 2017 at 8:50 PM, Andreas Mueller > wrote: > > Hey all. > > There's gonna be an introductory scikit-learn sprint at NYC on > Saturday that a local Women's DS/ML group is organizing with me. > I feel like we could do a bit more to improve (gender) diversity > in the scipy/pydata space, and so I think this will be cool. > > If anyone wants to review code on Saturday that would be a great > help for people getting started. > Also, if anyone wants to help beforehand, making sure there is > enough "easy" and "need contributor" issues tagged > is important, as well as ensuring that all the tagged issues > actually still need contributors. > > I'll try to do as much of these as I can but my time is limited > these days :( > > Thanks y'all! > > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Tue Feb 28 23:07:42 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Tue, 28 Feb 2017 20:07:42 -0800 Subject: [scikit-learn] Women in Machine Learning and Data Science Sprint next Weekend (also call for help) In-Reply-To: References: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com> Message-ID: Okay. I will be there. Is there going to be a chat channel of some sort to organize things? On Tue, Feb 28, 2017 at 4:28 PM, Andreas Mueller wrote: > Thanks! > It's gonna be 9:30 till 4, but I'd be surprised if there's a lot going on > on the issue tracker before 11h with setup etc. > (EST that is). > > Andy > > > On 02/27/2017 11:58 PM, Jacob Schreiber wrote: > > I will try to carve out some time Saturday to review PRs. What time is it > occuring? > > On Mon, Feb 27, 2017 at 8:50 PM, Andreas Mueller wrote: > >> Hey all. >> >> There's gonna be an introductory scikit-learn sprint at NYC on Saturday >> that a local Women's DS/ML group is organizing with me. >> I feel like we could do a bit more to improve (gender) diversity in the >> scipy/pydata space, and so I think this will be cool. >> >> If anyone wants to review code on Saturday that would be a great help for >> people getting started. >> Also, if anyone wants to help beforehand, making sure there is enough >> "easy" and "need contributor" issues tagged >> is important, as well as ensuring that all the tagged issues actually >> still need contributors. >> >> I'll try to do as much of these as I can but my time is limited these >> days :( >> >> Thanks y'all! >> >> Andy >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amandalmia18 at gmail.com Tue Feb 28 23:46:08 2017 From: amandalmia18 at gmail.com (Aman Dalmia) Date: Wed, 1 Mar 2017 10:16:08 +0530 Subject: [scikit-learn] GSoC, 2017 - Parallel Decision Tree Building Message-ID: Hello Sir, Thank you for your response. You made it very clear for me as to what needs to be done. I'll have a careful look at the code for the tree module and would try to start implementing some part of the functionality desired. I'll get back to you if I get stuck and post the link of my proposal once I am done with the first draft of it. However, I don't see scikit-learn being mentioned as on the ideas page for Python Software Foundation - https://summerofcode.withgoogle.com/organizations/5164886469378048/. Is there an error? Thanks, Aman Dalmia -------------- next part -------------- An HTML attachment was scrubbed... URL: