From pedropazzini at gmail.com Mon Jan 2 15:44:25 2017 From: pedropazzini at gmail.com (Pedro Pazzini) Date: Mon, 2 Jan 2017 18:44:25 -0200 Subject: [scikit-learn] KNeighborsClassifier and metric='precomputed' Message-ID: Hi all! I'm trying to use a KNeighborsClassifier with precomputed metric. In it's predict method (http://scikit-learn.org/stable/modules/generated/sklearn .neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier.predict) it says the input should be: "(n_query, n_indexed) if metric == ?precomputed?" What is n_indexed? Shouldn't the shape of the input in the predict method be (n_query,n_query)? How can I use the predict method after fitting the classifier with a distance matrix? Regards, Pedro Pazzini -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Mon Jan 2 16:10:20 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 3 Jan 2017 08:10:20 +1100 Subject: [scikit-learn] KNeighborsClassifier and metric='precomputed' In-Reply-To: References: Message-ID: n_indexed means the number of samples in the X passed to fit. It needs to be able to compare each prediction sample with each training sample. On 3 January 2017 at 07:44, Pedro Pazzini wrote: > Hi all! > > I'm trying to use a KNeighborsClassifier with precomputed metric. In it's > predict method (http://scikit-learn.org/stable/modules/generated/sklearn > .neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNe > ighborsClassifier.predict) it says the input should be: > > "(n_query, n_indexed) if metric == ?precomputed?" > > What is n_indexed? > > Shouldn't the shape of the input in the predict method be > (n_query,n_query)? > > How can I use the predict method after fitting the classifier with a > distance matrix? > > Regards, > Pedro Pazzini > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pedropazzini at gmail.com Tue Jan 3 10:33:22 2017 From: pedropazzini at gmail.com (Pedro Pazzini) Date: Tue, 3 Jan 2017 13:33:22 -0200 Subject: [scikit-learn] KNeighborsClassifier and metric='precomputed' In-Reply-To: References:

Message-ID: Joel, Your explanation helped me understand it. Thanks! 2017-01-02 19:10 GMT-02:00 Joel Nothman : > n_indexed means the number of samples in the X passed to fit. It needs to > be able to compare each prediction sample with each training sample. > > On 3 January 2017 at 07:44, Pedro Pazzini wrote: > >> Hi all! >> >> I'm trying to use a KNeighborsClassifier with precomputed metric. In >> it's predict method (http://scikit-learn.org/stable/modules/generated/ >> sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeigh >> borsClassifier.predict) it says the input should be: >> >> "(n_query, n_indexed) if metric == ?precomputed?" >> >> What is n_indexed? >> >> Shouldn't the shape of the input in the predict method be >> (n_query,n_query)? >> >> How can I use the predict method after fitting the classifier with a >> distance matrix? >> >> Regards, >> Pedro Pazzini >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Jan 3 12:19:33 2017 From: t3kcit at gmail.com (Andy) Date: Tue, 3 Jan 2017 12:19:33 -0500 Subject: [scikit-learn] KNeighborsClassifier and metric='precomputed' In-Reply-To: References:

Message-ID: <1e4a624b-4f02-b621-f7c8-1cee4c2c6786@gmail.com> Should probably be called n_samples_train? On 01/02/2017 04:10 PM, Joel Nothman wrote: > n_indexed means the number of samples in the X passed to fit. It needs > to be able to compare each prediction sample with each training sample. > > On 3 January 2017 at 07:44, Pedro Pazzini > wrote: > > Hi all! > > I'm trying to use a KNeighborsClassifier with precomputed metric. > In it's predict method > (http://scikit-learn.org/stable/modules/generated/ > sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier.predict) > it says the input should be: > > "(n_query, n_indexed) if metric == ?precomputed?" > > What is n_indexed? > > Shouldn't the shape of the input in the predict method be > (n_query,n_query)? > > How can I use the predict method after fitting the classifier with > a distance matrix? > > Regards, > Pedro Pazzini > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaggi.federico at gmail.com Tue Jan 3 12:31:44 2017 From: vaggi.federico at gmail.com (federico vaggi) Date: Tue, 03 Jan 2017 17:31:44 +0000 Subject: [scikit-learn] KNeighborsClassifier and metric='precomputed' In-Reply-To: <1e4a624b-4f02-b621-f7c8-1cee4c2c6786@gmail.com> References:

<1e4a624b-4f02-b621-f7c8-1cee4c2c6786@gmail.com> Message-ID: That would be most helpful. Maybe also explain the logic? On Tue, 3 Jan 2017 at 18:19 Andy wrote: > Should probably be called n_samples_train? > > > On 01/02/2017 04:10 PM, Joel Nothman wrote: > > n_indexed means the number of samples in the X passed to fit. It needs to > be able to compare each prediction sample with each training sample. > > On 3 January 2017 at 07:44, Pedro Pazzini wrote: > > Hi all! > > I'm trying to use a KNeighborsClassifier with precomputed metric. In it's > predict method (http://scikit-learn.org/stable/modules/generated/sklearn > .neighbors.KNeighborsClassifier.html#sklearn.neighbors. > KNeighborsClassifier.predict) it says the input should be: > > "(n_query, n_indexed) if metric == ?precomputed?" > > What is n_indexed? > > Shouldn't the shape of the input in the predict method be > (n_query,n_query)? > > How can I use the predict method after fitting the classifier with a > distance matrix? > > Regards, > Pedro Pazzini > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pedropazzini at gmail.com Tue Jan 3 13:09:57 2017 From: pedropazzini at gmail.com (Pedro Pazzini) Date: Tue, 3 Jan 2017 16:09:57 -0200 Subject: [scikit-learn] KNeighborsClassifier and metric='precomputed' In-Reply-To: References:

<1e4a624b-4f02-b621-f7c8-1cee4c2c6786@gmail.com> Message-ID: If I understood, each row of the input matrix in the predict method contains the distances from a query point to each point in the training set. I think the reference should make this more clear. 2017-01-03 15:31 GMT-02:00 federico vaggi : > That would be most helpful. Maybe also explain the logic? > > On Tue, 3 Jan 2017 at 18:19 Andy wrote: >> >> Should probably be called n_samples_train? >> >> >> On 01/02/2017 04:10 PM, Joel Nothman wrote: >> >> n_indexed means the number of samples in the X passed to fit. It needs to >> be able to compare each prediction sample with each training sample. >> >> On 3 January 2017 at 07:44, Pedro Pazzini wrote: >>> >>> Hi all! >>> >>> I'm trying to use a KNeighborsClassifier with precomputed metric. In it's >>> predict method >>> (http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier.predict) >>> it says the input should be: >>> >>> "(n_query, n_indexed) if metric == ?precomputed?" >>> >>> What is n_indexed? >>> >>> Shouldn't the shape of the input in the predict method be >>> (n_query,n_query)? >>> >>> How can I use the predict method after fitting the classifier with a >>> distance matrix? >>> >>> Regards, >>> Pedro Pazzini >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From jonathan.taylor at stanford.edu Tue Jan 3 20:07:11 2017 From: jonathan.taylor at stanford.edu (Jonathan Taylor) Date: Tue, 3 Jan 2017 17:07:11 -0800 Subject: [scikit-learn] modifying CV score Message-ID: I'm looking for a simple way to get a small pipeline for choosing a parameter using a modification of CV for regression type problems. The modification is pretty simple, so, for squared-error or logistic deviance, it is a simple modification of the score of `Y` (binary labels) and `X.dot(beta)` (linear predictor). I've been trying to understand how to use sklearn for this as there is no need for me to rewrite the basic CV functions. I'd like to be able to use my own custom estimator (so I guess I just need a subclass of BaseEstimator with a `fit` method with (X,y) signature?), as well as my own modification of the score. I'd be happy to understand the code for an estimator whose fit returns `np.zeros(X.shape[1])` and a given scoring function like def score(estimator, X_test, y_test): beta = estimator.parameters_ # which is just a zero vector for my estimator -- I guess this is the way I should extract the linear # predictor linpred = X_test.dot(beta) #or maybe? linpred = estimator.transform(X_test) return np.linalg.norm(y_test - linpred) This would not be an interesting model, but it would help me understand how things are evaluated in the CV loop. I have read how to create a custom scorer in the docs but it does not seem to describe what `estimator` will be inside the CV loop. I presume a custom scorer will get called with values X_test and y_test and I suppose estimator will be a model fit to X_train and y_train? -- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Wed Jan 4 07:44:22 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 4 Jan 2017 13:44:22 +0100 Subject: [scikit-learn] modifying CV score In-Reply-To: References: Message-ID: You can indeed derive from BaseEstimator and implement fit, predict and optionally score. Here is the documentation for the expected estimator API: http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects As this is a linear regression model, you can also want to have a look at the LinearModel and RegressionMixin base classes for inspiration: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/base.py#L401 Note that the score function should always be "higher is better". The explained variance ratio and negative mean squared error are valid scoring functions for model selection in scikit-learn while raw MSE is not not. -- Olivier From gael.varoquaux at normalesup.org Wed Jan 4 07:50:42 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 4 Jan 2017 13:50:42 +0100 Subject: [scikit-learn] modifying CV score In-Reply-To: References: Message-ID: <20170104125042.GG3264230@phare.normalesup.org> > I've been trying to understand how to use sklearn for this as there is > no need for me to rewrite the basic CV functions. I'd like to be able > to use my own custom estimator (so I guess I just need a subclass of > BaseEstimator with a `fit` method with (X,y) signature?), as well as my > own modification of the score. Be aware that scikit-learn assume a few things about estimators. One of them being that the __init__ should not do anything else than store the parameters that it is given. > I'd be happy to understand the code for an estimator whose fit returns > `np.zeros(X.shape[1])` Another assumption is that "fit" always returns self. The API that defines a scikit-learn object is detailed here: http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects From jonathan.taylor at stanford.edu Wed Jan 4 16:47:29 2017 From: jonathan.taylor at stanford.edu (Jonathan Taylor) Date: Wed, 4 Jan 2017 13:47:29 -0800 Subject: [scikit-learn] modifying CV score Message-ID: (Think this is right reply to from a digest... If not, apologies) Thanks for the pointers. From what I read on the API, I gather that for an estimator with a score method, inside GridSearchCV there will be pseudo-code like ... estimator.fit(X_train, y_train) scorer = estimator.score return scorer(X_test, y_test) For custom scores that are not methods of an estimator, I guess the `make_scorer` function returns a callable with the same signature as a score method of an estimator? -- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Jan 4 22:06:43 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 5 Jan 2017 14:06:43 +1100 Subject: [scikit-learn] modifying CV score In-Reply-To: References: Message-ID: Well, it returns the equivalent of lambda estimator, X, y: estimator.score(X, y) On 5 January 2017 at 08:47, Jonathan Taylor wrote: > (Think this is right reply to from a digest... If not, apologies) > > Thanks for the pointers. From what I read on the API, I gather that for an > estimator with a score method, inside GridSearchCV there will be > pseudo-code like > > ... > estimator.fit(X_train, y_train) > scorer = estimator.score > return scorer(X_test, y_test) > > > For custom scores that are not methods of an estimator, I guess the > `make_scorer` function returns a callable with the same signature as a > score method of an estimator? > > -- > Jonathan Taylor > Dept. of Statistics > Sequoia Hall, 137 > 390 Serra Mall > Stanford, CA 94305 > Tel: 650.723.9230 <(650)%20723-9230> > Fax: 650.725.8977 <(650)%20725-8977> > Web: http://www-stat.stanford.edu/~jtaylo > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tevang3 at gmail.com Sat Jan 7 11:15:54 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Sat, 7 Jan 2017 17:15:54 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor Message-ID: Greetings, I have trained many MLPRegressors using different random_state value and estimated the R^2 using cross-validation. Now I want to combine the top 10% of them in how to get more accurate predictions. Is there a meta-estimator that can get as input a few precomputed MLPRegressors and give consensus predictions? Can the BaggingRegressor do this job using MLPRegressors as input? Thanks in advance for any hint. Thomas -- ====================================================================== Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tevang at pharm.uoa.gr tevang3 at gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Sat Jan 7 13:27:21 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Sat, 7 Jan 2017 13:27:21 -0500 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: Message-ID: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> Hi, Thomas, the VotingClassifier can combine different models per majority voting amongst their predictions. Unfortunately, it refits the classifiers though (after cloning them). I think we implemented it this way to make it compatible to GridSearch and so forth. However, I have a version of the estimator that you can initialize with ?refit=False? to avoid refitting if it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers Best, Sebastian > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis wrote: > > Greetings, > > I have trained many MLPRegressors using different random_state value and estimated the R^2 using cross-validation. Now I want to combine the top 10% of them in how to get more accurate predictions. Is there a meta-estimator that can get as input a few precomputed MLPRegressors and give consensus predictions? Can the BaggingRegressor do this job using MLPRegressors as input? > > Thanks in advance for any hint. > Thomas > > > -- > ====================================================================== > Thomas Evangelidis > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tevang at pharm.uoa.gr > tevang3 at gmail.com > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From tevang3 at gmail.com Sat Jan 7 13:49:03 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Sat, 7 Jan 2017 19:49:03 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> Message-ID: Hi Sebastian, Thanks, I will try it in another classification problem I have. However, this time I am using regressors not classifiers. On Jan 7, 2017 19:28, "Sebastian Raschka" wrote: > Hi, Thomas, > > the VotingClassifier can combine different models per majority voting > amongst their predictions. Unfortunately, it refits the classifiers though > (after cloning them). I think we implemented it this way to make it > compatible to GridSearch and so forth. However, I have a version of the > estimator that you can initialize with ?refit=False? to avoid refitting if > it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/ > EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers > > Best, > Sebastian > > > > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis > wrote: > > > > Greetings, > > > > I have trained many MLPRegressors using different random_state value and > estimated the R^2 using cross-validation. Now I want to combine the top 10% > of them in how to get more accurate predictions. Is there a meta-estimator > that can get as input a few precomputed MLPRegressors and give consensus > predictions? Can the BaggingRegressor do this job using MLPRegressors as > input? > > > > Thanks in advance for any hint. > > Thomas > > > > > > -- > > ====================================================================== > > Thomas Evangelidis > > Research Specialist > > CEITEC - Central European Institute of Technology > > Masaryk University > > Kamenice 5/A35/1S081, > > 62500 Brno, Czech Republic > > > > email: tevang at pharm.uoa.gr > > tevang3 at gmail.com > > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Sat Jan 7 15:20:55 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Sat, 7 Jan 2017 15:20:55 -0500 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> Message-ID: <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com> Hi, Thomas, sorry, I overread the regression part ? This would be a bit trickier, I am not sure what a good strategy for averaging regression outputs would be. However, if you just want to compute the average, you could do sth like np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) However, it may be better to use stacking, and use the output of r.predict(X) as meta features to train a model based on these? Best, Sebastian > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis wrote: > > Hi Sebastian, > > Thanks, I will try it in another classification problem I have. However, this time I am using regressors not classifiers. > > On Jan 7, 2017 19:28, "Sebastian Raschka" wrote: > Hi, Thomas, > > the VotingClassifier can combine different models per majority voting amongst their predictions. Unfortunately, it refits the classifiers though (after cloning them). I think we implemented it this way to make it compatible to GridSearch and so forth. However, I have a version of the estimator that you can initialize with ?refit=False? to avoid refitting if it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers > > Best, > Sebastian > > > > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis wrote: > > > > Greetings, > > > > I have trained many MLPRegressors using different random_state value and estimated the R^2 using cross-validation. Now I want to combine the top 10% of them in how to get more accurate predictions. Is there a meta-estimator that can get as input a few precomputed MLPRegressors and give consensus predictions? Can the BaggingRegressor do this job using MLPRegressors as input? > > > > Thanks in advance for any hint. > > Thomas > > > > > > -- > > ====================================================================== > > Thomas Evangelidis > > Research Specialist > > CEITEC - Central European Institute of Technology > > Masaryk University > > Kamenice 5/A35/1S081, > > 62500 Brno, Czech Republic > > > > email: tevang at pharm.uoa.gr > > tevang3 at gmail.com > > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From ismaelfm_ at ciencias.unam.mx Sat Jan 7 15:52:10 2017 From: ismaelfm_ at ciencias.unam.mx (=?utf-8?Q?Jos=C3=A9_Ismael_Fern=C3=A1ndez_Mart=C3=ADnez?=) Date: Sat, 7 Jan 2017 14:52:10 -0600 Subject: [scikit-learn] Roc curve from multilabel classification has slope Message-ID: Hi, I have a multilabel classifier written in Keras from which I want to compute AUC and plot a ROC curve for every element classified from my test set. Everything seems fine, except that some elements have a roc curve that have a slope as follows: I don't know how to interpret the slope in such cases. Basically my workflow goes as follows, I have a pre-trained model, instance of Keras, and I have the features X and the binarized labels y, every element in y is an array of length 1000, as it is a multilabel classification problem each element in y might contain many 1s, indicating that the element belongs to multiples classes, so I used the built-in loss of binary_crossentropy and my outputs of the model prediction are score probailities. Then I plot the roc curve as follows. The predict method returns probabilities, as I'm using the functional api of keras. Does anyone knows why my roc curves looks like this? Ismael Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1.PNG Type: image/png Size: 132225 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image3.PNG Type: image/png Size: 42172 bytes Desc: not available URL: From tevang3 at gmail.com Sat Jan 7 16:36:37 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Sat, 7 Jan 2017 22:36:37 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com> References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com> Message-ID: On 7 January 2017 at 21:20, Sebastian Raschka wrote: > Hi, Thomas, > sorry, I overread the regression part ? > This would be a bit trickier, I am not sure what a good strategy for > averaging regression outputs would be. However, if you just want to compute > the average, you could do sth like > np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) > > However, it may be better to use stacking, and use the output of > r.predict(X) as meta features to train a model based on these? > ?Like to train an SVR to combine the predictions of the top 10% MLPRegressors using the same data that were used for training of the MLPRegressors? Wouldn't that lead to overfitting? ? > > Best, > Sebastian > > > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis > wrote: > > > > Hi Sebastian, > > > > Thanks, I will try it in another classification problem I have. However, > this time I am using regressors not classifiers. > > > > On Jan 7, 2017 19:28, "Sebastian Raschka" wrote: > > Hi, Thomas, > > > > the VotingClassifier can combine different models per majority voting > amongst their predictions. Unfortunately, it refits the classifiers though > (after cloning them). I think we implemented it this way to make it > compatible to GridSearch and so forth. However, I have a version of the > estimator that you can initialize with ?refit=False? to avoid refitting if > it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/ > EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers > > > > Best, > > Sebastian > > > > > > > > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis > wrote: > > > > > > Greetings, > > > > > > I have trained many MLPRegressors using different random_state value > and estimated the R^2 using cross-validation. Now I want to combine the top > 10% of them in how to get more accurate predictions. Is there a > meta-estimator that can get as input a few precomputed MLPRegressors and > give consensus predictions? Can the BaggingRegressor do this job using > MLPRegressors as input? > > > > > > Thanks in advance for any hint. > > > Thomas > > > > > > > > > -- > > > ====================================================================== > > > Thomas Evangelidis > > > Research Specialist > > > CEITEC - Central European Institute of Technology > > > Masaryk University > > > Kamenice 5/A35/1S081, > > > 62500 Brno, Czech Republic > > > > > > email: tevang at pharm.uoa.gr > > > tevang3 at gmail.com > > > > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- ====================================================================== Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tevang at pharm.uoa.gr tevang3 at gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Jan 7 17:03:05 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 8 Jan 2017 09:03:05 +1100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com> Message-ID: On 8 January 2017 at 08:36, Thomas Evangelidis wrote: > > > On 7 January 2017 at 21:20, Sebastian Raschka > wrote: > >> Hi, Thomas, >> sorry, I overread the regression part ? >> This would be a bit trickier, I am not sure what a good strategy for >> averaging regression outputs would be. However, if you just want to compute >> the average, you could do sth like >> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) >> >> However, it may be better to use stacking, and use the output of >> r.predict(X) as meta features to train a model based on these? >> > > ?Like to train an SVR to combine the predictions of the top 10% > MLPRegressors using the same data that were used for training of the > MLPRegressors? Wouldn't that lead to overfitting? > You could certainly hold out a different data sample and that might indeed be valuable regularisation, but it's not obvious to me that this is substantially more prone to overfitting than just training a handful of MLPRegressors on the same data and having them vote by other means. There is no problem, in general, with overfitting, as long as your evaluation of an estimator's performance isn't biased towards the training set. We've not talked about overfitting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Jan 7 17:03:22 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 8 Jan 2017 09:03:22 +1100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: * > There is no problem, in general, with overfitting, as long as your > evaluation of an estimator's performance isn't biased towards the training > set. We've not talked about evaluation. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Jan 7 17:04:14 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 8 Jan 2017 09:04:14 +1100 Subject: [scikit-learn] Roc curve from multilabel classification has slope In-Reply-To: References: Message-ID: predict method should not return probabilities in scikit-learn classifiers. predict_proba should. On 8 January 2017 at 07:52, Jos? Ismael Fern?ndez Mart?nez < ismaelfm_ at ciencias.unam.mx> wrote: > Hi, I have a multilabel classifier written in Keras from which I want to > compute AUC and plot a ROC curve for every element classified from my test > set. > > [image: image1.PNG] > > Everything seems fine, except that some elements have a roc curve that > have a slope as follows: > > [image: enter image description here] > I don't know how to interpret the > slope in such cases. > > Basically my workflow goes as follows, I have a pre-trained model, > instance of Keras, and I have the features X and the binarized labels y, > every element in y is an array of length 1000, as it is a multilabel > classification problem each element in y might contain many 1s, > indicating that the element belongs to multiples classes, so I used the > built-in loss of binary_crossentropy and my outputs of the model > prediction are score probailities. Then I plot the roc curve as follows. > > [image: image3.PNG] > > The predict method returns probabilities, as I'm using the functional api > of keras. > > Does anyone knows why my roc curves looks like this? > > > Ismael > > > Sent from my iPhone > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1.PNG Type: image/png Size: 132225 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image3.PNG Type: image/png Size: 42172 bytes Desc: not available URL: From tevang3 at gmail.com Sat Jan 7 17:26:41 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Sat, 7 Jan 2017 23:26:41 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Regarding the evaluation, I use the leave 20% out cross validation method. I cannot leave more out because my data sets are very small, between 30 and 40 observations, each one with 600 features. Is there a limit in the number of MLPRegressors I can combine with stacking considering my small data sets? On Jan 7, 2017 23:04, "Joel Nothman" wrote: > * > > >> There is no problem, in general, with overfitting, as long as your >> evaluation of an estimator's performance isn't biased towards the training >> set. We've not talked about evaluation. >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Sat Jan 7 18:04:08 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Sat, 7 Jan 2017 15:04:08 -0800 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: If you have such a small number of observations (with a much higher feature space) then why do you think you can accurately train not just a single MLP, but an ensemble of them without overfitting dramatically? On Sat, Jan 7, 2017 at 2:26 PM, Thomas Evangelidis wrote: > Regarding the evaluation, I use the leave 20% out cross validation method. > I cannot leave more out because my data sets are very small, between 30 and > 40 observations, each one with 600 features. Is there a limit in the number > of MLPRegressors I can combine with stacking considering my small data > sets? > > On Jan 7, 2017 23:04, "Joel Nothman" wrote: > >> * >> >> >>> There is no problem, in general, with overfitting, as long as your >>> evaluation of an estimator's performance isn't biased towards the training >>> set. We've not talked about evaluation. >>> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tevang3 at gmail.com Sat Jan 7 19:01:55 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Sun, 8 Jan 2017 01:01:55 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: On 8 January 2017 at 00:04, Jacob Schreiber wrote: > If you have such a small number of observations (with a much higher > feature space) then why do you think you can accurately train not just a > single MLP, but an ensemble of them without overfitting dramatically? > > > ?Because the observations in the data set don't differ much between them?. To be more specific, the data set consists of a congeneric series of organic molecules and the ebservation is their binding strength to a target protein. The idea was to train predictors that can predict the binding strenght of new molecules that belong to the same congeneric series. Therefore special care is taken to apply the predictors to the right domain of applicability. According to the literature, the same strategy has been followed in the past several times. The novelty of my approach stems from other factors that are irrelevant to this thread. -- ====================================================================== Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tevang at pharm.uoa.gr tevang3 at gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ismaelfm_ at ciencias.unam.mx Sat Jan 7 19:32:49 2017 From: ismaelfm_ at ciencias.unam.mx (=?utf-8?Q?Jos=C3=A9_Ismael_Fern=C3=A1ndez_Mart=C3=ADnez?=) Date: Sat, 7 Jan 2017 18:32:49 -0600 Subject: [scikit-learn] Roc curve from multilabel classification has slope In-Reply-To: References:

Message-ID: <6EEF6426-91D8-40D1-8FB8-E2F10D0327CA@ciencias.unam.mx> But is not a scikit-learn classifier, is a keras classifier which, in the functional API, predict returns probabilities. What I don't understand is why my plot of the roc curve has a slope, since I call roc_curve passing the actual label as y_true and the output of the classifier (score probabilities) as y_score for every element tested. Sent from my iPhone > On Jan 7, 2017, at 4:04 PM, Joel Nothman wrote: > > predict method should not return probabilities in scikit-learn classifiers. predict_proba should. > >> On 8 January 2017 at 07:52, Jos? Ismael Fern?ndez Mart?nez wrote: >> Hi, I have a multilabel classifier written in Keras from which I want to compute AUC and plot a ROC curve for every element classified from my test set. >> >> >> >> Everything seems fine, except that some elements have a roc curve that have a slope as follows: >> I don't know how to interpret the slope in such cases. >> >> Basically my workflow goes as follows, I have a pre-trained model, instance of Keras, and I have the features X and the binarized labels y, every element in y is an array of length 1000, as it is a multilabel classification problem each element in y might contain many 1s, indicating that the element belongs to multiples classes, so I used the built-in loss of binary_crossentropy and my outputs of the model prediction are score probailities. Then I plot the roc curve as follows. >> >> >> >> The predict method returns probabilities, as I'm using the functional api of keras. >> >> Does anyone knows why my roc curves looks like this? >> >> Ismael >> >> >> Sent from my iPhone >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Sat Jan 7 19:40:41 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Sat, 7 Jan 2017 16:40:41 -0800 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: This is an aside to what your original question was, but as someone who has dealt with similar data in bioinformatics (gene expression, specifically) I think you should tread -very- carefully if you have such a small sample set and more dimensions than features. MLPs are already prone to overfit and both of those factors would make me inherently suspicious of the results. This sounds like an easy way to trick yourself into thinking you are making good predictions. Perhaps consider LASSO? Back to the original question, it is true that using a SVR in a stacking technique would add more parameters to your model, but it is likely an insignificant amount when compared to the MLPs themselves. Alternatively you may consider using LASSO using all of the MLPs (not just the top 10%) so you can learn which ones yield useful features for a meta-estimator instead of just selecting the top 10%. On Sat, Jan 7, 2017 at 4:01 PM, Thomas Evangelidis wrote: > > > On 8 January 2017 at 00:04, Jacob Schreiber > wrote: > >> If you have such a small number of observations (with a much higher >> feature space) then why do you think you can accurately train not just a >> single MLP, but an ensemble of them without overfitting dramatically? >> >> >> > ?Because the observations in the data set don't differ much between them?. > To be more specific, the data set consists of a congeneric series of > organic molecules and the ebservation is their binding strength to a target > protein. The idea was to train predictors that can predict the binding > strenght of new molecules that belong to the same congeneric series. > Therefore special care is taken to apply the predictors to the right domain > of applicability. According to the literature, the same strategy has been > followed in the past several times. The novelty of my approach stems from > other factors that are irrelevant to this thread. > > > -- > > ====================================================================== > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tevang at pharm.uoa.gr > > tevang3 at gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Sat Jan 7 19:42:02 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Sat, 7 Jan 2017 16:42:02 -0800 Subject: [scikit-learn] Roc curve from multilabel classification has slope In-Reply-To: <6EEF6426-91D8-40D1-8FB8-E2F10D0327CA@ciencias.unam.mx> References:

<6EEF6426-91D8-40D1-8FB8-E2F10D0327CA@ciencias.unam.mx> Message-ID: Slope usually means there are ties in your predictions. Check your dataset to see if you have repeated predicted values (possibly 1 or 0). On Sat, Jan 7, 2017 at 4:32 PM, Jos? Ismael Fern?ndez Mart?nez < ismaelfm_ at ciencias.unam.mx> wrote: > But is not a scikit-learn classifier, is a keras classifier which, in the > functional API, predict returns probabilities. > What I don't understand is why my plot of the roc curve has a slope, since > I call roc_curve passing the actual label as y_true and the output of the > classifier (score probabilities) as y_score for every element tested. > > > > Sent from my iPhone > On Jan 7, 2017, at 4:04 PM, Joel Nothman wrote: > > predict method should not return probabilities in scikit-learn > classifiers. predict_proba should. > > On 8 January 2017 at 07:52, Jos? Ismael Fern?ndez Mart?nez < > ismaelfm_ at ciencias.unam.mx> wrote: > >> Hi, I have a multilabel classifier written in Keras from which I want to >> compute AUC and plot a ROC curve for every element classified from my test >> set. >> >> >> >> Everything seems fine, except that some elements have a roc curve that >> have a slope as follows: >> >> [image: enter image description here] >> I don't know how to interpret the >> slope in such cases. >> >> Basically my workflow goes as follows, I have a pre-trained model, >> instance of Keras, and I have the features X and the binarized labels y, >> every element in y is an array of length 1000, as it is a multilabel >> classification problem each element in y might contain many 1s, >> indicating that the element belongs to multiples classes, so I used the >> built-in loss of binary_crossentropy and my outputs of the model >> prediction are score probailities. Then I plot the roc curve as follows. >> >> >> The predict method returns probabilities, as I'm using the functional api >> of keras. >> >> Does anyone knows why my roc curves looks like this? >> >> >> Ismael >> >> >> Sent from my iPhone >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff1evesque at yahoo.com Sun Jan 8 00:25:05 2017 From: jeff1evesque at yahoo.com (Jeffrey Levesque) Date: Sun, 8 Jan 2017 00:25:05 -0500 Subject: [scikit-learn] Jeff Levesque: Sample SVM / SVR dataset Message-ID: <7C01E03F-882B-45C2-A72B-54631180338F@yahoo.com> Hey guys, Im working on developing a web-interface, and programmatic api, to scikit-learn: - https://github.com/jeff1evesque/machine-learning However, I've only interfaced the SVM, and SVR classes. To be thorough, for development within git, I've created unit tests for the Travis CI. But, I made up some bogus datasets, in order to unit test the SVM, and SVR predictions: - dataset: https://github.com/jeff1evesque/machine-learning/tree/master/interface/static/data - unit tests: https://github.com/jeff1evesque/machine-learning/tree/master/test/live_server But, I'd prefer to have real data, so the computed prediction is more meaningful, instead of predicating on made up data. The corresponding unit tests I have, simply check if a prediction can be made for the supplied dataset. However, I'd like to check the prediction against a known, expected result, which is the motivation of having real meaningful dataset(s): - https://github.com/jeff1evesque/machine-learning/issues/2751 Does anyone have sample dataset(s) they have used for SVM, or SVR predictions? I'd like my unit tests to be somewhat interesting, yet more meaningful. Thank you, Jeff Levesque https://github.com/jeff1evesque From rth.yurchak at gmail.com Sun Jan 8 04:27:08 2017 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Sun, 8 Jan 2017 10:27:08 +0100 Subject: [scikit-learn] Roc curve from multilabel classification has slope In-Reply-To: References:

<6EEF6426-91D8-40D1-8FB8-E2F10D0327CA@ciencias.unam.mx> Message-ID: <587205EC.6060402@gmail.com> Jos?, I might be misunderstanding something, but wouldn't it make more sens to plot one ROC curve for every class in your result (using all samples at once), as opposed to plotting it for every training sample as you are doing now? Cf the example below, http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html Roman On 08/01/17 01:42, Jacob Schreiber wrote: > Slope usually means there are ties in your predictions. Check your > dataset to see if you have repeated predicted values (possibly 1 or 0). > > On Sat, Jan 7, 2017 at 4:32 PM, Jos? Ismael Fern?ndez Mart?nez > > wrote: > > But is not a scikit-learn classifier, is a keras classifier which, > in the functional API, predict returns probabilities. > What I don't understand is why my plot of the roc curve has a slope, > since I call roc_curve passing the actual label as y_true and the > output of the classifier (score probabilities) as y_score for every > element tested. > > > > Sent from my iPhone > On Jan 7, 2017, at 4:04 PM, Joel Nothman > wrote: > >> predict method should not return probabilities in scikit-learn >> classifiers. predict_proba should. >> >> On 8 January 2017 at 07:52, Jos? Ismael Fern?ndez Mart?nez >> > >> wrote: >> >> Hi, I have a multilabel classifier written in Keras from which >> I want to compute AUC and plot a ROC curve for every element >> classified from my test set. >> >> >> >> Everything seems fine, except that some elements have a roc >> curve that have a slope as follows: >> >> enter image description here >> I don't know how to >> interpret the slope in such cases. >> >> Basically my workflow goes as follows, I have a >> pre-trained |model|, instance of Keras, and I have the >> features |X| and the binarized labels |y|, every element >> in |y| is an array of length 1000, as it is a multilabel >> classification problem each element in |y| might contain many >> 1s, indicating that the element belongs to multiples classes, >> so I used the built-in loss of |binary_crossentropy| and my >> outputs of the model prediction are score probailities. Then I >> plot the roc curve as follows. >> >> >> The predict method returns probabilities, as I'm using the >> functional api of keras. >> >> Does anyone knows why my roc curves looks like this? >> >> >> Ismael >> >> >> >> Sent from my iPhone >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From se.raschka at gmail.com Sun Jan 8 05:53:53 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Sun, 8 Jan 2017 05:53:53 -0500 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com> Message-ID: > Like to train an SVR to combine the predictions of the top 10% MLPRegressors using the same data that were used for training of the MLPRegressors? Wouldn't that lead to overfitting? It could, but you don't need to use the same data that you used for training to fit the meta estimator. Like it is commonly done in stacking with cross validation, you can train the mlps on training folds and pass predictions from a test fold to the meta estimator but then you'd have to retrain your mlps and it sounded like you wanted to avoid that. I am currently on mobile and only browsed through the thread briefly, but I agree with others that it may sound like your model(s) may have too much capacity for such a small dataset -- can be tricky to fit the parameters without overfitting. In any case, if you to do the stacking, I'd probably insert a k-fold cv between the mlps and the meta estimator. However I'd definitely also recommend simpler models als alternative. Best, Sebastian > On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis wrote: > > > >> On 7 January 2017 at 21:20, Sebastian Raschka wrote: >> Hi, Thomas, >> sorry, I overread the regression part ? >> This would be a bit trickier, I am not sure what a good strategy for averaging regression outputs would be. However, if you just want to compute the average, you could do sth like >> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) >> >> However, it may be better to use stacking, and use the output of r.predict(X) as meta features to train a model based on these? > > ?Like to train an SVR to combine the predictions of the top 10% MLPRegressors using the same data that were used for training of the MLPRegressors? Wouldn't that lead to overfitting? > ? >> >> Best, >> Sebastian >> >> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis wrote: >> > >> > Hi Sebastian, >> > >> > Thanks, I will try it in another classification problem I have. However, this time I am using regressors not classifiers. >> > >> > On Jan 7, 2017 19:28, "Sebastian Raschka" wrote: >> > Hi, Thomas, >> > >> > the VotingClassifier can combine different models per majority voting amongst their predictions. Unfortunately, it refits the classifiers though (after cloning them). I think we implemented it this way to make it compatible to GridSearch and so forth. However, I have a version of the estimator that you can initialize with ?refit=False? to avoid refitting if it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers >> > >> > Best, >> > Sebastian >> > >> > >> > >> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis wrote: >> > > >> > > Greetings, >> > > >> > > I have trained many MLPRegressors using different random_state value and estimated the R^2 using cross-validation. Now I want to combine the top 10% of them in how to get more accurate predictions. Is there a meta-estimator that can get as input a few precomputed MLPRegressors and give consensus predictions? Can the BaggingRegressor do this job using MLPRegressors as input? >> > > >> > > Thanks in advance for any hint. >> > > Thomas >> > > >> > > >> > > -- >> > > ====================================================================== >> > > Thomas Evangelidis >> > > Research Specialist >> > > CEITEC - Central European Institute of Technology >> > > Masaryk University >> > > Kamenice 5/A35/1S081, >> > > 62500 Brno, Czech Republic >> > > >> > > email: tevang at pharm.uoa.gr >> > > tevang3 at gmail.com >> > > >> > > website: https://sites.google.com/site/thomasevangelidishomepage/ >> > > >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > ====================================================================== > Thomas Evangelidis > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tevang at pharm.uoa.gr > tevang3 at gmail.com > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From tevang3 at gmail.com Sun Jan 8 06:42:09 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Sun, 8 Jan 2017 12:42:09 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Sebastian and Jacob, Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor performance on my data. MLPregressors are way superior. On an other note, MLPregressor class has some methods to contol overfitting, like controling the alpha parameter for the L2 regularization (maybe setting it to a high value?) or the number of neurons in the hidden layers (lowering the hidden_layer_sizes?) or even "early_stopping=True". Wouldn't these be sufficient to be on the safe side. Once more I want to highlight something I wrote previously but might have been overlooked. The resulting MLPRegressors will be applied to new datasets that *ARE VERY SIMILAR TO THE TRAINING DATA*. In other words the application of the models will be strictly confined to their applicability domain. Wouldn't that be sufficient to not worry about model overfitting too much? On 8 January 2017 at 11:53, Sebastian Raschka wrote: > Like to train an SVR to combine the predictions of the top 10% > MLPRegressors using the same data that were used for training of the > MLPRegressors? Wouldn't that lead to overfitting? > > > It could, but you don't need to use the same data that you used for > training to fit the meta estimator. Like it is commonly done in stacking > with cross validation, you can train the mlps on training folds and pass > predictions from a test fold to the meta estimator but then you'd have to > retrain your mlps and it sounded like you wanted to avoid that. > > I am currently on mobile and only browsed through the thread briefly, but > I agree with others that it may sound like your model(s) may have too much > capacity for such a small dataset -- can be tricky to fit the parameters > without overfitting. In any case, if you to do the stacking, I'd probably > insert a k-fold cv between the mlps and the meta estimator. However I'd > definitely also recommend simpler models als > alternative. > > Best, > Sebastian > > On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis wrote: > > > > On 7 January 2017 at 21:20, Sebastian Raschka > wrote: > >> Hi, Thomas, >> sorry, I overread the regression part ? >> This would be a bit trickier, I am not sure what a good strategy for >> averaging regression outputs would be. However, if you just want to compute >> the average, you could do sth like >> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) >> >> However, it may be better to use stacking, and use the output of >> r.predict(X) as meta features to train a model based on these? >> > > ?Like to train an SVR to combine the predictions of the top 10% > MLPRegressors using the same data that were used for training of the > MLPRegressors? Wouldn't that lead to overfitting? > ? > > >> >> Best, >> Sebastian >> >> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis >> wrote: >> > >> > Hi Sebastian, >> > >> > Thanks, I will try it in another classification problem I have. >> However, this time I am using regressors not classifiers. >> > >> > On Jan 7, 2017 19:28, "Sebastian Raschka" wrote: >> > Hi, Thomas, >> > >> > the VotingClassifier can combine different models per majority voting >> amongst their predictions. Unfortunately, it refits the classifiers though >> (after cloning them). I think we implemented it this way to make it >> compatible to GridSearch and so forth. However, I have a version of the >> estimator that you can initialize with ?refit=False? to avoid refitting if >> it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/Ensembl >> eVoteClassifier/#example-5-using-pre-fitted-classifiers >> > >> > Best, >> > Sebastian >> > >> > >> > >> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis >> wrote: >> > > >> > > Greetings, >> > > >> > > I have trained many MLPRegressors using different random_state value >> and estimated the R^2 using cross-validation. Now I want to combine the top >> 10% of them in how to get more accurate predictions. Is there a >> meta-estimator that can get as input a few precomputed MLPRegressors and >> give consensus predictions? Can the BaggingRegressor do this job using >> MLPRegressors as input? >> > > >> > > Thanks in advance for any hint. >> > > Thomas >> > > >> > > >> > > -- >> > > ============================================================ >> ========== >> > > Thomas Evangelidis >> > > Research Specialist >> > > CEITEC - Central European Institute of Technology >> > > Masaryk University >> > > Kamenice 5/A35/1S081, >> > > 62500 Brno, Czech Republic >> > > >> > > email: tevang at pharm.uoa.gr >> > > tevang3 at gmail.com >> > > >> > > website: https://sites.google.com/site/thomasevangelidishomepage/ >> > > >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > -- > > ====================================================================== > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tevang at pharm.uoa.gr > > tevang3 at gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- ====================================================================== Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tevang at pharm.uoa.gr tevang3 at gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sun Jan 8 23:08:53 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Mon, 9 Jan 2017 15:08:53 +1100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Btw, I may have been unclear in the discussion of overfitting. For *training* the meta-estimator in stacking, it's standard to do something like cross_val_predict on your training set to produce its input features. On 8 January 2017 at 22:42, Thomas Evangelidis wrote: > Sebastian and Jacob, > > Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor > performance on my data. MLPregressors are way superior. On an other note, > MLPregressor class has some methods to contol overfitting, like controling > the alpha parameter for the L2 regularization (maybe setting it to a high > value?) or the number of neurons in the hidden layers (lowering the hidden_layer_sizes?) > or even "early_stopping=True". Wouldn't these be sufficient to be on the > safe side. > > Once more I want to highlight something I wrote previously but might have > been overlooked. The resulting MLPRegressors will be applied to new > datasets that *ARE VERY SIMILAR TO THE TRAINING DATA*. In other words the > application of the models will be strictly confined to their applicability > domain. Wouldn't that be sufficient to not worry about model overfitting > too much? > > > > > > On 8 January 2017 at 11:53, Sebastian Raschka > wrote: > >> Like to train an SVR to combine the predictions of the top 10% >> MLPRegressors using the same data that were used for training of the >> MLPRegressors? Wouldn't that lead to overfitting? >> >> >> It could, but you don't need to use the same data that you used for >> training to fit the meta estimator. Like it is commonly done in stacking >> with cross validation, you can train the mlps on training folds and pass >> predictions from a test fold to the meta estimator but then you'd have to >> retrain your mlps and it sounded like you wanted to avoid that. >> >> I am currently on mobile and only browsed through the thread briefly, but >> I agree with others that it may sound like your model(s) may have too much >> capacity for such a small dataset -- can be tricky to fit the parameters >> without overfitting. In any case, if you to do the stacking, I'd probably >> insert a k-fold cv between the mlps and the meta estimator. However I'd >> definitely also recommend simpler models als >> alternative. >> >> Best, >> Sebastian >> >> On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis wrote: >> >> >> >> On 7 January 2017 at 21:20, Sebastian Raschka >> wrote: >> >>> Hi, Thomas, >>> sorry, I overread the regression part ? >>> This would be a bit trickier, I am not sure what a good strategy for >>> averaging regression outputs would be. However, if you just want to compute >>> the average, you could do sth like >>> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) >>> >>> However, it may be better to use stacking, and use the output of >>> r.predict(X) as meta features to train a model based on these? >>> >> >> ?Like to train an SVR to combine the predictions of the top 10% >> MLPRegressors using the same data that were used for training of the >> MLPRegressors? Wouldn't that lead to overfitting? >> ? >> >> >>> >>> Best, >>> Sebastian >>> >>> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis >>> wrote: >>> > >>> > Hi Sebastian, >>> > >>> > Thanks, I will try it in another classification problem I have. >>> However, this time I am using regressors not classifiers. >>> > >>> > On Jan 7, 2017 19:28, "Sebastian Raschka" >>> wrote: >>> > Hi, Thomas, >>> > >>> > the VotingClassifier can combine different models per majority voting >>> amongst their predictions. Unfortunately, it refits the classifiers though >>> (after cloning them). I think we implemented it this way to make it >>> compatible to GridSearch and so forth. However, I have a version of the >>> estimator that you can initialize with ?refit=False? to avoid refitting if >>> it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/Ensembl >>> eVoteClassifier/#example-5-using-pre-fitted-classifiers >>> > >>> > Best, >>> > Sebastian >>> > >>> > >>> > >>> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis >>> wrote: >>> > > >>> > > Greetings, >>> > > >>> > > I have trained many MLPRegressors using different random_state value >>> and estimated the R^2 using cross-validation. Now I want to combine the top >>> 10% of them in how to get more accurate predictions. Is there a >>> meta-estimator that can get as input a few precomputed MLPRegressors and >>> give consensus predictions? Can the BaggingRegressor do this job using >>> MLPRegressors as input? >>> > > >>> > > Thanks in advance for any hint. >>> > > Thomas >>> > > >>> > > >>> > > -- >>> > > ============================================================ >>> ========== >>> > > Thomas Evangelidis >>> > > Research Specialist >>> > > CEITEC - Central European Institute of Technology >>> > > Masaryk University >>> > > Kamenice 5/A35/1S081, >>> > > 62500 Brno, Czech Republic >>> > > >>> > > email: tevang at pharm.uoa.gr >>> > > tevang3 at gmail.com >>> > > >>> > > website: https://sites.google.com/site/thomasevangelidishomepage/ >>> > > >>> > > >>> > > _______________________________________________ >>> > > scikit-learn mailing list >>> > > scikit-learn at python.org >>> > > https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn at python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn at python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> >> -- >> >> ====================================================================== >> >> Thomas Evangelidis >> >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tevang at pharm.uoa.gr >> >> tevang3 at gmail.com >> >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > > ====================================================================== > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tevang at pharm.uoa.gr > > tevang3 at gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Jan 9 04:48:41 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 9 Jan 2017 10:48:41 +0100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release Message-ID: Hi all, I think we should release 0.18.2 to get some important fixes and make it easy to release Python 3.6 wheel package for all the operating systems using the automated procedure. I identified a couple of PR to backport to 0.18.X to prepare the 0.18.2 release. Are there any other important recently fixed bugfs people would like to see backported in this release? https://github.com/scikit-learn/scikit-learn/milestone/23?closed=1 Best, -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From joel.nothman at gmail.com Mon Jan 9 06:04:05 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Mon, 9 Jan 2017 22:04:05 +1100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References: Message-ID: In terms of the bug fixes listed in the change-log, most seem non-urgent. I would consider pulling across #7954, #8006, #8087, #7872, #7983. But I also wonder whether we'd be better off sprinting towards a small 0.19 release. On 9 January 2017 at 20:48, Olivier Grisel wrote: > Hi all, > > I think we should release 0.18.2 to get some important fixes and make > it easy to release Python 3.6 wheel package for all the operating > systems using the automated procedure. > > I identified a couple of PR to backport to 0.18.X to prepare the > 0.18.2 release. Are there any other important recently fixed bugfs > people would like to see backported in this release? > > https://github.com/scikit-learn/scikit-learn/milestone/23?closed=1 > > Best, > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Jan 9 09:43:10 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 9 Jan 2017 15:43:10 +0100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References: Message-ID: In retrospect, making a small 0.19 release is probably a good idea. I would like to get https://github.com/scikit-learn/scikit-learn/pull/8002 in before cutting the 0.19.X branch. -- Olivier Grisel From ragvrv at gmail.com Mon Jan 9 10:06:35 2017 From: ragvrv at gmail.com (Raghav R V) Date: Mon, 9 Jan 2017 16:06:35 +0100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References:

Message-ID: I think it would be nice to have 0.19 by April. We'd have 3 more months and we can frame some roadmap towards it? On Mon, Jan 9, 2017 at 3:43 PM, Olivier Grisel wrote: > In retrospect, making a small 0.19 release is probably a good idea. > > I would like to get > https://github.com/scikit-learn/scikit-learn/pull/8002 in before > cutting the 0.19.X branch. > > -- > Olivier Grisel > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Raghav RV https://github.com/raghavrv -------------- next part -------------- An HTML attachment was scrubbed... URL: From ragvrv at gmail.com Mon Jan 9 10:07:53 2017 From: ragvrv at gmail.com (Raghav R V) Date: Mon, 9 Jan 2017 16:07:53 +0100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References:

Message-ID: (So we can get back to the one release per 4 month cycle?) On Mon, Jan 9, 2017 at 4:06 PM, Raghav R V wrote: > I think it would be nice to have 0.19 by April. We'd have 3 more months > and we can frame some roadmap towards it? > > On Mon, Jan 9, 2017 at 3:43 PM, Olivier Grisel > wrote: > >> In retrospect, making a small 0.19 release is probably a good idea. >> >> I would like to get >> https://github.com/scikit-learn/scikit-learn/pull/8002 in before >> cutting the 0.19.X branch. >> >> -- >> Olivier Grisel >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > -- > Raghav RV > https://github.com/raghavrv > > -- Raghav RV https://github.com/raghavrv -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Jan 9 10:12:08 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 9 Jan 2017 16:12:08 +0100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References:

Message-ID: I would rather like to get it out before April ideally and instead of setting up a roadmap I would rather just identify bugs that are blockers and fix only those and don't wait for any feature before cutting 0.19.X. -- Olivier From gael.varoquaux at normalesup.org Mon Jan 9 10:15:46 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Jan 2017 16:15:46 +0100 Subject: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release In-Reply-To: References:

Message-ID: <20170109151546.GM2802991@phare.normalesup.org> > instead of setting up a roadmap I would rather just identify bugs that > are blockers and fix only those and don't wait for any feature before > cutting 0.19.X. +1 From raga.markely at gmail.com Mon Jan 9 11:29:28 2017 From: raga.markely at gmail.com (Raga Markely) Date: Mon, 9 Jan 2017 11:29:28 -0500 Subject: [scikit-learn] Generalized Discriminant Analysis with Kernel Message-ID: Hello, I wonder if scikit-learn has implementation for generalized discriminant analysis using kernel approach? http://www.kernel-machines.org/papers/upload_21840_GDA.pdf I did some search, but couldn't find. Thank you, Raga -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Mon Jan 9 13:21:59 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Mon, 9 Jan 2017 10:21:59 -0800 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Thomas, it can be difficult to fine tune L1/L2 regularization in the case where n_parameters >>> n_samples ~and~ n_features >> n_samples. If your samples are very similar to the training data, why are simpler models not working well? On Sun, Jan 8, 2017 at 8:08 PM, Joel Nothman wrote: > Btw, I may have been unclear in the discussion of overfitting. For > *training* the meta-estimator in stacking, it's standard to do something > like cross_val_predict on your training set to produce its input features. > > On 8 January 2017 at 22:42, Thomas Evangelidis wrote: > >> Sebastian and Jacob, >> >> Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor >> performance on my data. MLPregressors are way superior. On an other note, >> MLPregressor class has some methods to contol overfitting, like controling >> the alpha parameter for the L2 regularization (maybe setting it to a high >> value?) or the number of neurons in the hidden layers (lowering the hidden_layer_sizes?) >> or even "early_stopping=True". Wouldn't these be sufficient to be on the >> safe side. >> >> Once more I want to highlight something I wrote previously but might have >> been overlooked. The resulting MLPRegressors will be applied to new >> datasets that *ARE VERY SIMILAR TO THE TRAINING DATA*. In other words >> the application of the models will be strictly confined to their >> applicability domain. Wouldn't that be sufficient to not worry about model >> overfitting too much? >> >> >> >> >> >> On 8 January 2017 at 11:53, Sebastian Raschka >> wrote: >> >>> Like to train an SVR to combine the predictions of the top 10% >>> MLPRegressors using the same data that were used for training of the >>> MLPRegressors? Wouldn't that lead to overfitting? >>> >>> >>> It could, but you don't need to use the same data that you used for >>> training to fit the meta estimator. Like it is commonly done in stacking >>> with cross validation, you can train the mlps on training folds and pass >>> predictions from a test fold to the meta estimator but then you'd have to >>> retrain your mlps and it sounded like you wanted to avoid that. >>> >>> I am currently on mobile and only browsed through the thread briefly, >>> but I agree with others that it may sound like your model(s) may have too >>> much capacity for such a small dataset -- can be tricky to fit the >>> parameters without overfitting. In any case, if you to do the stacking, I'd >>> probably insert a k-fold cv between the mlps and the meta estimator. >>> However I'd definitely also recommend simpler models als >>> alternative. >>> >>> Best, >>> Sebastian >>> >>> On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis >>> wrote: >>> >>> >>> >>> On 7 January 2017 at 21:20, Sebastian Raschka >>> wrote: >>> >>>> Hi, Thomas, >>>> sorry, I overread the regression part ? >>>> This would be a bit trickier, I am not sure what a good strategy for >>>> averaging regression outputs would be. However, if you just want to compute >>>> the average, you could do sth like >>>> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) >>>> >>>> However, it may be better to use stacking, and use the output of >>>> r.predict(X) as meta features to train a model based on these? >>>> >>> >>> ?Like to train an SVR to combine the predictions of the top 10% >>> MLPRegressors using the same data that were used for training of the >>> MLPRegressors? Wouldn't that lead to overfitting? >>> ? >>> >>> >>>> >>>> Best, >>>> Sebastian >>>> >>>> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis >>>> wrote: >>>> > >>>> > Hi Sebastian, >>>> > >>>> > Thanks, I will try it in another classification problem I have. >>>> However, this time I am using regressors not classifiers. >>>> > >>>> > On Jan 7, 2017 19:28, "Sebastian Raschka" >>>> wrote: >>>> > Hi, Thomas, >>>> > >>>> > the VotingClassifier can combine different models per majority voting >>>> amongst their predictions. Unfortunately, it refits the classifiers though >>>> (after cloning them). I think we implemented it this way to make it >>>> compatible to GridSearch and so forth. However, I have a version of the >>>> estimator that you can initialize with ?refit=False? to avoid refitting if >>>> it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/Ensembl >>>> eVoteClassifier/#example-5-using-pre-fitted-classifiers >>>> > >>>> > Best, >>>> > Sebastian >>>> > >>>> > >>>> > >>>> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis >>>> wrote: >>>> > > >>>> > > Greetings, >>>> > > >>>> > > I have trained many MLPRegressors using different random_state >>>> value and estimated the R^2 using cross-validation. Now I want to combine >>>> the top 10% of them in how to get more accurate predictions. Is there a >>>> meta-estimator that can get as input a few precomputed MLPRegressors and >>>> give consensus predictions? Can the BaggingRegressor do this job using >>>> MLPRegressors as input? >>>> > > >>>> > > Thanks in advance for any hint. >>>> > > Thomas >>>> > > >>>> > > >>>> > > -- >>>> > > ============================================================ >>>> ========== >>>> > > Thomas Evangelidis >>>> > > Research Specialist >>>> > > CEITEC - Central European Institute of Technology >>>> > > Masaryk University >>>> > > Kamenice 5/A35/1S081, >>>> > > 62500 Brno, Czech Republic >>>> > > >>>> > > email: tevang at pharm.uoa.gr >>>> > > tevang3 at gmail.com >>>> > > >>>> > > website: https://sites.google.com/site/thomasevangelidishomepage/ >>>> > > >>>> > > >>>> > > _______________________________________________ >>>> > > scikit-learn mailing list >>>> > > scikit-learn at python.org >>>> > > https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> > _______________________________________________ >>>> > scikit-learn mailing list >>>> > scikit-learn at python.org >>>> > https://mail.python.org/mailman/listinfo/scikit-learn >>>> > _______________________________________________ >>>> > scikit-learn mailing list >>>> > scikit-learn at python.org >>>> > https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> >>> -- >>> >>> ====================================================================== >>> >>> Thomas Evangelidis >>> >>> Research Specialist >>> CEITEC - Central European Institute of Technology >>> Masaryk University >>> Kamenice 5/A35/1S081, >>> 62500 Brno, Czech Republic >>> >>> email: tevang at pharm.uoa.gr >>> >>> tevang3 at gmail.com >>> >>> >>> website: https://sites.google.com/site/thomasevangelidishomepage/ >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> -- >> >> ====================================================================== >> >> Thomas Evangelidis >> >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tevang at pharm.uoa.gr >> >> tevang3 at gmail.com >> >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smith_r at ligo.caltech.edu Mon Jan 9 14:34:26 2017 From: smith_r at ligo.caltech.edu (Rory Smith) Date: Mon, 9 Jan 2017 11:34:26 -0800 Subject: [scikit-learn] Complex variables in Gaussian mixture models? Message-ID: <1A6E40A6-5019-44F8-BF56-EC382E8908FD@ligo.caltech.edu> Hi All, I?d like to set up a GMM using mixture.BayesianGaussianMixture to model a probability density of complex random variables (the learned means and covariances should also be complex valued). I wasn?t able to see any mention of how to handle complex variables in the documentation so I?m curious if it?s possible in the current implementation. I tried the obvious thing of first generating a 1D array of complex random numbers, but I see these warning when I try and fit the array X using dpgmm = mixture.BayesianGaussianMixture(n_components=4, covariance_type='full', n_init=1).fit(X) ~/miniconda2/lib/python2.7/site-packages/sklearn/utils/validation.py:382: ComplexWarning: Casting complex values to real discards the imaginary part array = np.array(array, dtype=dtype, order=order, copy=copy) And as might be expected from the warning, the learned means are real. Any advice on this problem would be greatly appreciated! Best, Rory -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaggi.federico at gmail.com Mon Jan 9 15:42:24 2017 From: vaggi.federico at gmail.com (federico vaggi) Date: Mon, 09 Jan 2017 20:42:24 +0000 Subject: [scikit-learn] Complex variables in Gaussian mixture models? In-Reply-To: <1A6E40A6-5019-44F8-BF56-EC382E8908FD@ligo.caltech.edu> References: <1A6E40A6-5019-44F8-BF56-EC382E8908FD@ligo.caltech.edu> Message-ID: Probably not the most principled way to handle it, but: can't you treat 1 dimensional complex numbers as 2 dimensional real numbers, and then try to cluster those with the GMM? On Mon, 9 Jan 2017 at 20:34 Rory Smith wrote: > Hi All, > > I?d like to set up a GMM using mixture.BayesianGaussianMixture to model a > probability density of complex random variables (the learned means and > covariances should also be complex valued). I wasn?t able to see any > mention of how to handle complex variables in the documentation so I?m > curious if it?s possible in the current implementation. > I tried the obvious thing of first generating a 1D array of complex > random numbers, but I see these warning when I try and fit the array X > using > > dpgmm = mixture.BayesianGaussianMixture(n_components=4, > covariance_type='full', n_init=1 > ).fit(X) > > ~/miniconda2/lib/python2.7/site-packages/sklearn/utils/validation.py:382: > ComplexWarning: Casting complex values to real discards the imaginary part > array = np.array(array, dtype=dtype, order=order, copy=copy) > > > And as might be expected from the warning, the learned means are real. > > Any advice on this problem would be greatly appreciated! > > Best, > Rory > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Mon Jan 9 15:43:23 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Mon, 9 Jan 2017 12:43:23 -0800 Subject: [scikit-learn] Complex variables in Gaussian mixture models? In-Reply-To: <1A6E40A6-5019-44F8-BF56-EC382E8908FD@ligo.caltech.edu> References: <1A6E40A6-5019-44F8-BF56-EC382E8908FD@ligo.caltech.edu> Message-ID: I'm not too familiar with how complex values are traditionally treated, but is it possible to make the complex component a real valued component and treat it just as having twice as many features? On Mon, Jan 9, 2017 at 11:34 AM, Rory Smith wrote: > Hi All, > > I?d like to set up a GMM using mixture.BayesianGaussianMixture to model a > probability density of complex random variables (the learned means and > covariances should also be complex valued). I wasn?t able to see any > mention of how to handle complex variables in the documentation so I?m > curious if it?s possible in the current implementation. > I tried the obvious thing of first generating a 1D array of complex > random numbers, but I see these warning when I try and fit the array X > using > > dpgmm = mixture.BayesianGaussianMixture(n_components=4, > covariance_type='full', n_init=1 > ).fit(X) > > ~/miniconda2/lib/python2.7/site-packages/sklearn/utils/validation.py:382: > ComplexWarning: Casting complex values to real discards the imaginary part > array = np.array(array, dtype=dtype, order=order, copy=copy) > > > And as might be expected from the warning, the learned means are real. > > Any advice on this problem would be greatly appreciated! > > Best, > Rory > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Mon Jan 9 17:55:57 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Mon, 9 Jan 2017 17:55:57 -0500 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: > Once more I want to highlight something I wrote previously but might have been overlooked. The resulting MLPRegressors will be applied to new datasets that ARE VERY SIMILAR TO THE TRAINING DATA. In other words the application of the models will be strictly confined to their applicability domain. Wouldn't that be sufficient to not worry about model overfitting too much? If you have a very small dataset and a very large number of features, I?d always be careful with/avoid models that have a high capacity. However, it is really hard to answer this question because we don?t know much about your training and evaluation approach. If you didn?t do much hyperparameter tuning and cross-validation for model selection, and if you set aside a sufficiently large portion as an independent test set that you only looked at once and get a good performance on that, you may be lucky and a complex MLP may generalize well. However, like others said, it?s really hard to get an MLP right (not memorizing training data) if n_samples is small and n_features is large. And for n_features > n_samples, that may be very, very hard. > like controling the alpha parameter for the L2 regularization (maybe setting it to a high value?) or the number of neurons in the hidden layers (lowering the hidden_layer_sizes?) or even "early_stopping=True" As a rule of thumb, the higher the capacity the higher the degree/chance of overfitting. So yes, this could help a little bit. You probably also want to try dropout instead of L2 (or in addition), which usually has a stronger effect on regularization (esp. if you have a very large set of redundant features). Can?t remember the exact paper, but I read about an approach where the authors set a max constraint for the weights in combination with dropout, e.g. ? ||w||_2 < constant ?, which worked even better than dropout alone (the constant becomes another hyperparm to tune though). Best, Sebastian > On Jan 9, 2017, at 1:21 PM, Jacob Schreiber wrote: > > Thomas, it can be difficult to fine tune L1/L2 regularization in the case where n_parameters >>> n_samples ~and~ n_features >> n_samples. If your samples are very similar to the training data, why are simpler models not working well? > > > > On Sun, Jan 8, 2017 at 8:08 PM, Joel Nothman wrote: > Btw, I may have been unclear in the discussion of overfitting. For *training* the meta-estimator in stacking, it's standard to do something like cross_val_predict on your training set to produce its input features. > > On 8 January 2017 at 22:42, Thomas Evangelidis wrote: > Sebastian and Jacob, > > Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor performance on my data. MLPregressors are way superior. On an other note, MLPregressor class has some methods to contol overfitting, like controling the alpha parameter for the L2 regularization (maybe setting it to a high value?) or the number of neurons in the hidden layers (lowering the hidden_layer_sizes?) or even "early_stopping=True". Wouldn't these be sufficient to be on the safe side. > > Once more I want to highlight something I wrote previously but might have been overlooked. The resulting MLPRegressors will be applied to new datasets that ARE VERY SIMILAR TO THE TRAINING DATA. In other words the application of the models will be strictly confined to their applicability domain. Wouldn't that be sufficient to not worry about model overfitting too much? > > > > > > On 8 January 2017 at 11:53, Sebastian Raschka wrote: >> Like to train an SVR to combine the predictions of the top 10% MLPRegressors using the same data that were used for training of the MLPRegressors? Wouldn't that lead to overfitting? > > It could, but you don't need to use the same data that you used for training to fit the meta estimator. Like it is commonly done in stacking with cross validation, you can train the mlps on training folds and pass predictions from a test fold to the meta estimator but then you'd have to retrain your mlps and it sounded like you wanted to avoid that. > > I am currently on mobile and only browsed through the thread briefly, but I agree with others that it may sound like your model(s) may have too much capacity for such a small dataset -- can be tricky to fit the parameters without overfitting. In any case, if you to do the stacking, I'd probably insert a k-fold cv between the mlps and the meta estimator. However I'd definitely also recommend simpler models als > alternative. > > Best, > Sebastian > > On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis wrote: > >> >> >> On 7 January 2017 at 21:20, Sebastian Raschka wrote: >> Hi, Thomas, >> sorry, I overread the regression part ? >> This would be a bit trickier, I am not sure what a good strategy for averaging regression outputs would be. However, if you just want to compute the average, you could do sth like >> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) >> >> However, it may be better to use stacking, and use the output of r.predict(X) as meta features to train a model based on these? >> >> ?Like to train an SVR to combine the predictions of the top 10% MLPRegressors using the same data that were used for training of the MLPRegressors? Wouldn't that lead to overfitting? >> ? >> >> Best, >> Sebastian >> >> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis wrote: >> > >> > Hi Sebastian, >> > >> > Thanks, I will try it in another classification problem I have. However, this time I am using regressors not classifiers. >> > >> > On Jan 7, 2017 19:28, "Sebastian Raschka" wrote: >> > Hi, Thomas, >> > >> > the VotingClassifier can combine different models per majority voting amongst their predictions. Unfortunately, it refits the classifiers though (after cloning them). I think we implemented it this way to make it compatible to GridSearch and so forth. However, I have a version of the estimator that you can initialize with ?refit=False? to avoid refitting if it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers >> > >> > Best, >> > Sebastian >> > >> > >> > >> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis wrote: >> > > >> > > Greetings, >> > > >> > > I have trained many MLPRegressors using different random_state value and estimated the R^2 using cross-validation. Now I want to combine the top 10% of them in how to get more accurate predictions. Is there a meta-estimator that can get as input a few precomputed MLPRegressors and give consensus predictions? Can the BaggingRegressor do this job using MLPRegressors as input? >> > > >> > > Thanks in advance for any hint. >> > > Thomas >> > > >> > > >> > > -- >> > > ====================================================================== >> > > Thomas Evangelidis >> > > Research Specialist >> > > CEITEC - Central European Institute of Technology >> > > Masaryk University >> > > Kamenice 5/A35/1S081, >> > > 62500 Brno, Czech Republic >> > > >> > > email: tevang at pharm.uoa.gr >> > > tevang3 at gmail.com >> > > >> > > website: https://sites.google.com/site/thomasevangelidishomepage/ >> > > >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> -- >> ====================================================================== >> Thomas Evangelidis >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tevang at pharm.uoa.gr >> tevang3 at gmail.com >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -- > ====================================================================== > Thomas Evangelidis > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tevang at pharm.uoa.gr > tevang3 at gmail.com > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From tevang3 at gmail.com Mon Jan 9 18:40:59 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Tue, 10 Jan 2017 00:40:59 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Jacob & Sebastian, I think the best way to find out if my modeling approach works is to find a larger dataset, split it into two parts, the first one will be used as training/cross-validation set and the second as a test set, like in a real case scenario. Regarding the MLPRegressor regularization, below is my optimum setup: MLPRegressor(random_state=random_state, max_iter=400, early_stopping=True, > validation_fraction=0.2, alpha=10, hidden_layer_sizes=(10,)) This means only one hidden layer with maximum 10 neurons, alpha=10 for L2 regularization and early stopping to terminate training if validation score is not improving. I think this is a quite simple model. My final predictor is an SVR that combines 2 MLPRegressors, each one trained with different types of input data. @Sebastian You have mentioned dropout again but I could not find it in the docs: http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor Maybe you are referring to another MLPRegressor implementation? I have seen a while ago another implementation you had on github. Can you clarify which one you recommend and why? Thank you both of you for your hints! best Thomas -- ====================================================================== Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tevang at pharm.uoa.gr tevang3 at gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Mon Jan 9 19:21:09 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Tue, 10 Jan 2017 00:21:09 +0000 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: If you dont have a large dataset, you can still do leave one out cross validation. On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis wrote: > > Jacob & Sebastian, > > I think the best way to find out if my modeling approach works is to find > a larger dataset, split it into two parts, the first one will be used as > training/cross-validation set and the second as a test set, like in a real > case scenario. > > Regarding the MLPRegressor regularization, below is my optimum setup: > > MLPRegressor(random_state=random_state, max_iter=400, early_stopping=True, > validation_fraction=0.2, alpha=10, hidden_layer_sizes=(10,)) > > > This means only one hidden layer with maximum 10 neurons, alpha=10 for L2 > regularization and early stopping to terminate training if validation score > is not improving. I think this is a quite simple model. My final predictor > is an SVR that combines 2 MLPRegressors, each one trained with different > types of input data. > > @Sebastian > You have mentioned dropout again but I could not find it in the docs: > > http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor > > Maybe you are referring to another MLPRegressor implementation? I have > seen a while ago another implementation you had on github. Can you clarify > which one you recommend and why? > > > Thank you both of you for your hints! > > best > Thomas > > > > -- > > > > > > > > > > > > > > > > > ====================================================================== > > > Thomas Evangelidis > > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tevang at pharm.uoa.gr > > > tevang3 at gmail.com > > > > website: > > https://sites.google.com/site/thomasevangelidishomepage/ > > > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Mon Jan 9 19:36:41 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Mon, 9 Jan 2017 16:36:41 -0800 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Even with a single layer with 10 neurons you're still trying to train over 6000 parameters using ~30 samples. Dropout is a concept common in neural networks, but doesn't appear to be in sklearn's implementation of MLPs. Early stopping based on validation performance isn't an "extra" step for reducing overfitting, it's basically a required step for neural networks. It seems like you have a validation sample of ~6 datapoints.. I'm still very skeptical of that giving you proper results for a complex model. Will this larger dataset be of exactly the same data? Just taking another unrelated dataset and showing that a MLP can learn it doesn't mean it will work for your specific data. Can you post the actual results from using LASSO, RandomForestRegressor, GradientBoostingRegressor, and MLP? On Mon, Jan 9, 2017 at 4:21 PM, Stuart Reynolds wrote: > If you dont have a large dataset, you can still do leave one out cross > validation. > > On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis > wrote: > >> >> Jacob & Sebastian, >> >> I think the best way to find out if my modeling approach works is to find >> a larger dataset, split it into two parts, the first one will be used as >> training/cross-validation set and the second as a test set, like in a real >> case scenario. >> >> Regarding the MLPRegressor regularization, below is my optimum setup: >> >> MLPRegressor(random_state=random_state, max_iter=400, >> early_stopping=True, validation_fraction=0.2, alpha=10, >> hidden_layer_sizes=(10,)) >> >> >> This means only one hidden layer with maximum 10 neurons, alpha=10 for L2 >> regularization and early stopping to terminate training if validation score >> is not improving. I think this is a quite simple model. My final predictor >> is an SVR that combines 2 MLPRegressors, each one trained with different >> types of input data. >> >> @Sebastian >> You have mentioned dropout again but I could not find it in the docs: >> http://scikit-learn.org/stable/modules/generated/sklearn.neural_network. >> MLPRegressor.html#sklearn.neural_network.MLPRegressor >> >> Maybe you are referring to another MLPRegressor implementation? I have >> seen a while ago another implementation you had on github. Can you clarify >> which one you recommend and why? >> >> >> Thank you both of you for your hints! >> >> best >> Thomas >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ====================================================================== >> >> >> Thomas Evangelidis >> >> >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tevang at pharm.uoa.gr >> >> >> tevang3 at gmail.com >> >> >> >> website: >> >> https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smith_r at ligo.caltech.edu Mon Jan 9 20:16:37 2017 From: smith_r at ligo.caltech.edu (Rory Smith) Date: Mon, 9 Jan 2017 17:16:37 -0800 Subject: [scikit-learn] Complex variables in Gaussian mixture models? In-Reply-To: References: <1A6E40A6-5019-44F8-BF56-EC382E8908FD@ligo.caltech.edu> Message-ID: Hi Jacob, Fredrico It should be possible to treat the problem as one of having twice as many real features, but it comes at the expense of more complex code on the user's side and extra bookkeeping that would be nice to have scikit handle under the hood. I would expect that all the tricks needed to break up a Gaussian Kernel of complex variables into real and imaginary components would be relatively simple to implement within the source code. Do you think that this is worth submitting an issue to the issue tracker? (I?m not familiar with Best, Rory > On Jan 9, 2017, at 12:43 PM, Jacob Schreiber wrote: > > I'm not too familiar with how complex values are traditionally treated, but is it possible to make the complex component a real valued component and treat it just as having twice as many features? > > On Mon, Jan 9, 2017 at 11:34 AM, Rory Smith > wrote: > Hi All, > > I?d like to set up a GMM using mixture.BayesianGaussianMixture to model a probability density of complex random variables (the learned means and covariances should also be complex valued). I wasn?t able to see any mention of how to handle complex variables in the documentation so I?m curious if it?s possible in the current implementation. > I tried the obvious thing of first generating a 1D array of complex random numbers, but I see these warning when I try and fit the array X using > > dpgmm = mixture.BayesianGaussianMixture(n_components=4, > covariance_type='full', n_init=1).fit(X) > > ~/miniconda2/lib/python2.7/site-packages/sklearn/utils/validation.py:382: ComplexWarning: Casting complex values to real discards the imaginary part > array = np.array(array, dtype=dtype, order=order, copy=copy) > > > And as might be expected from the warning, the learned means are real. > > Any advice on this problem would be greatly appreciated! > > Best, > Rory > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From avn at mccme.ru Tue Jan 10 03:58:59 2017 From: avn at mccme.ru (avn at mccme.ru) Date: Tue, 10 Jan 2017 11:58:59 +0300 Subject: [scikit-learn] Generalized Discriminant Analysis with Kernel In-Reply-To: References: Message-ID: Hi Raga, You may try approximating your kernel using Nystroem kernel approximator (kernel_approximation.Nystroem) and then apply LDA to the transformed feature vectors. If you choose dimensionality of the target space (n_components) large enough (depending on your kernel and data), Nystroem approximator should provide sufficiently good kernel approximation for such combination to approximate GDA. Raga Markely ????? 2017-01-09 19:29: > Hello, > > I wonder if scikit-learn has implementation for generalized > discriminant analysis using kernel approach? > http://www.kernel-machines.org/papers/upload_21840_GDA.pdf > > I did some search, but couldn't find. > > Thank you, > Raga > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From tevang3 at gmail.com Tue Jan 10 07:46:58 2017 From: tevang3 at gmail.com (Thomas Evangelidis) Date: Tue, 10 Jan 2017 13:46:58 +0100 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Jacob, The features are not 6000. I train 2 MLPRegressors from two types of data, both refer to the same dataset (35 molecules in total) but each one contains different type of information. The first data consist of 60 features. I tried 100 different random states and measured the average |R| using the leave-20%-out cross-validation. Below are the results from the first data: RandomForestRegressor: |R|= 0.389018243545 +- 0.252891783658 LASSO: |R|= 0.247411754937 +- 0.232325286471 GradientBoostingRegressor: |R|= 0.324483769202 +- 0.211778410841 MLPRegressor: |R|= 0.540528696597 +- 0.255714448793 The second type of data consist of 456 features. Below are the results for these too: RandomForestRegressor: |R|= 0.361562548904 +- 0.234872385318 LASSO: |R|= 3.27752711304e-16 +- 2.60800139195e-16 GradientBoostingRegressor: |R|= 0.328087138161 +- 0.229588427086 MLPRegressor: |R|= 0.455473342507 +- 0.24579081197 At the end I want to combine models created from these data types using a meta-estimator (that was my original question). The combination with the highest |R| (0.631851796403 +- 0.247911204514) was produced by an SVR that combined the best MLPRegressor from data type 1 and the best MLPRegressor from data type2: On 10 January 2017 at 01:36, Jacob Schreiber wrote: > Even with a single layer with 10 neurons you're still trying to train over > 6000 parameters using ~30 samples. Dropout is a concept common in neural > networks, but doesn't appear to be in sklearn's implementation of MLPs. > Early stopping based on validation performance isn't an "extra" step for > reducing overfitting, it's basically a required step for neural networks. > It seems like you have a validation sample of ~6 datapoints.. I'm still > very skeptical of that giving you proper results for a complex model. Will > this larger dataset be of exactly the same data? Just taking another > unrelated dataset and showing that a MLP can learn it doesn't mean it will > work for your specific data. Can you post the actual results from using > LASSO, RandomForestRegressor, GradientBoostingRegressor, and MLP? > > On Mon, Jan 9, 2017 at 4:21 PM, Stuart Reynolds > wrote: > >> If you dont have a large dataset, you can still do leave one out cross >> validation. >> >> On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis >> wrote: >> >>> >>> Jacob & Sebastian, >>> >>> I think the best way to find out if my modeling approach works is to >>> find a larger dataset, split it into two parts, the first one will be used >>> as training/cross-validation set and the second as a test set, like in a >>> real case scenario. >>> >>> Regarding the MLPRegressor regularization, below is my optimum setup: >>> >>> MLPRegressor(random_state=random_state, max_iter=400, >>> early_stopping=True, validation_fraction=0.2, alpha=10, >>> hidden_layer_sizes=(10,)) >>> >>> >>> This means only one hidden layer with maximum 10 neurons, alpha=10 for >>> L2 regularization and early stopping to terminate training if validation >>> score is not improving. I think this is a quite simple model. My final >>> predictor is an SVR that combines 2 MLPRegressors, each one trained with >>> different types of input data. >>> >>> @Sebastian >>> You have mentioned dropout again but I could not find it in the docs: >>> http://scikit-learn.org/stable/modules/generated/sklearn. >>> neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor >>> >>> Maybe you are referring to another MLPRegressor implementation? I have >>> seen a while ago another implementation you had on github. Can you clarify >>> which one you recommend and why? >>> >>> >>> Thank you both of you for your hints! >>> >>> best >>> Thomas >>> >>> >>> >>> -- >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ====================================================================== >>> >>> >>> Thomas Evangelidis >>> >>> >>> Research Specialist >>> CEITEC - Central European Institute of Technology >>> Masaryk University >>> Kamenice 5/A35/1S081, >>> 62500 Brno, Czech Republic >>> >>> email: tevang at pharm.uoa.gr >>> >>> >>> tevang3 at gmail.com >>> >>> >>> >>> website: >>> >>> https://sites.google.com/site/thomasevangelidishomepage/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> scikit-learn mailing list >>> >>> scikit-learn at python.org >>> >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- ====================================================================== Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tevang at pharm.uoa.gr tevang3 at gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Tue Jan 10 10:16:16 2017 From: raga.markely at gmail.com (Raga Markely) Date: Tue, 10 Jan 2017 10:16:16 -0500 Subject: [scikit-learn] Generalized Discriminant Analysis with Kernel Message-ID: Thank you very much for your info on Nystroem kernel approximator. I appreciate it! Best, Raga On Tue, Jan 10, 2017 at 7:47 AM, wrote: > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > Date: Tue, 10 Jan 2017 11:58:59 +0300 > From: avn at mccme.ru > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Generalized Discriminant Analysis with > Kernel > Message-ID: > Content-Type: text/plain; charset=UTF-8; format=flowed > > Hi Raga, > > You may try approximating your kernel using Nystroem kernel approximator > (kernel_approximation.Nystroem) and then apply LDA to the transformed > feature vectors. If you choose dimensionality of the target space > (n_components) large enough (depending on your kernel and data), > Nystroem approximator should provide sufficiently good kernel > approximation for such combination to approximate GDA. > > Raga Markely ????? 2017-01-09 19:29: > > Hello, > > > > I wonder if scikit-learn has implementation for generalized > > discriminant analysis using kernel approach? > > http://www.kernel-machines.org/papers/upload_21840_GDA.pdf > > > > I did some search, but couldn't find. > > > > Thank you, > > Raga > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From surangakas at gmail.com Tue Jan 10 12:36:33 2017 From: surangakas at gmail.com (Suranga Kasthurirathne) Date: Tue, 10 Jan 2017 12:36:33 -0500 Subject: [scikit-learn] Specify boosting percentage using Randomoversampling? Message-ID: Hi all, I apologize - i've been looking for this answer all over the internet, and it could be that I'm not googling the right terms. For managing unbalanced datasets, Weka has SMOTE, and scikit has randomoversampling. In weka, we can ask it to boost by a given percentage (say 100%) so an undersampled class with 10 values ends up with 20 values (100% increase) after boosting. In Scikit learn, I cant seem to find a way to do this. The ramdomoversampler boosts arbitrarily. and seem to try to balance the two classes, which may not be realistic in some cases. Can anyone point me to how I can manage boosting percentage using scikit? -- Best Regards, Suranga -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.eickenberg at gmail.com Tue Jan 10 13:04:03 2017 From: michael.eickenberg at gmail.com (Michael Eickenberg) Date: Tue, 10 Jan 2017 19:04:03 +0100 Subject: [scikit-learn] Specify boosting percentage using Randomoversampling? In-Reply-To: References: Message-ID: Is maybe this contrib what you are looking for? Take a close look to see whether it does what you expect. http://contrib.scikit-learn.org/imbalanced-learn/auto_examples/over-sampling/plot_smote.html On Tue, Jan 10, 2017 at 6:36 PM, Suranga Kasthurirathne < surangakas at gmail.com> wrote: > > Hi all, > > I apologize - i've been looking for this answer all over the internet, and > it could be that I'm not googling the right terms. > > For managing unbalanced datasets, Weka has SMOTE, and scikit has > randomoversampling. > > In weka, we can ask it to boost by a given percentage (say 100%) so an > undersampled class with 10 values ends up with 20 values (100% increase) > after boosting. > > In Scikit learn, I cant seem to find a way to do this. The > ramdomoversampler boosts arbitrarily. and seem to try to balance the two > classes, which may not be realistic in some cases. > > Can anyone point me to how I can manage boosting percentage using scikit? > > -- > Best Regards, > Suranga > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Tue Jan 10 13:05:49 2017 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Tue, 10 Jan 2017 19:05:49 +0100 Subject: [scikit-learn] Specify boosting percentage using Randomoversampling? In-Reply-To: References: Message-ID: I will first assume that RandomOverSampling refer to imbalanced-learn API (a scikit-learn-contrib project). The parameter that you are seeking for is the ratio parameter. By default ratio='auto' which will balance the classes, as you described. The ratio can be given as a float as the ratio of the number of samples in the minority class over the number of samples in in the majority class. Check there for more info: http://contrib.scikit-learn.org/imbalanced-learn/generated/imblearn.over_sampling.RandomOverSampler.html#imblearn.over_sampling.RandomOverSampler On 10 January 2017 at 18:36, Suranga Kasthurirathne wrote: > > Hi all, > > I apologize - i've been looking for this answer all over the internet, and > it could be that I'm not googling the right terms. > > For managing unbalanced datasets, Weka has SMOTE, and scikit has > randomoversampling. > > In weka, we can ask it to boost by a given percentage (say 100%) so an > undersampled class with 10 values ends up with 20 values (100% increase) > after boosting. > > In Scikit learn, I cant seem to find a way to do this. The > ramdomoversampler boosts arbitrarily. and seem to try to balance the two > classes, which may not be realistic in some cases. > > Can anyone point me to how I can manage boosting percentage using scikit? > > -- > Best Regards, > Suranga > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Guillaume Lemaitre INRIA Saclay - Ile-de-France Equipe PARIETAL guillaume.lemaitre at inria.f r --- https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From surangakas at gmail.com Tue Jan 10 13:24:14 2017 From: surangakas at gmail.com (Suranga Kasthurirathne) Date: Tue, 10 Jan 2017 13:24:14 -0500 Subject: [scikit-learn] Specify boosting percentage using Randomoversampling? In-Reply-To: References: Message-ID: Well actually, i'm able to answer this myself. Its the ratio attribute (see: http://contrib.scikit-learn.org/imbalanced-learn/generated/imblearn.over_sampling.RandomOverSampler.html ) :) :) On Tue, Jan 10, 2017 at 12:36 PM, Suranga Kasthurirathne < surangakas at gmail.com> wrote: > > Hi all, > > I apologize - i've been looking for this answer all over the internet, and > it could be that I'm not googling the right terms. > > For managing unbalanced datasets, Weka has SMOTE, and scikit has > randomoversampling. > > In weka, we can ask it to boost by a given percentage (say 100%) so an > undersampled class with 10 values ends up with 20 values (100% increase) > after boosting. > > In Scikit learn, I cant seem to find a way to do this. The > ramdomoversampler boosts arbitrarily. and seem to try to balance the two > classes, which may not be realistic in some cases. > > Can anyone point me to how I can manage boosting percentage using scikit? > > -- > Best Regards, > Suranga > -- Best Regards, Suranga -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Tue Jan 10 13:47:16 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Tue, 10 Jan 2017 10:47:16 -0800 Subject: [scikit-learn] meta-estimator for multiple MLPRegressor In-Reply-To: References: <27CD690B-CA77-4121-8C95-9F2E52B99B95@gmail.com> <450C2C8D-86FC-4A87-B307-C5E45FE97C4B@gmail.com>

Message-ID: Thomas, Jacob's point is important -- its not the number of features that's important, its the number of free parameters. As the number of free parameters increases, the space of representable functions grows to the point where the cost function is minimized by having a single parameter explain each variable. This is true of many ML methods. In the case of a decision trees, for example you can allow each node (a free parameter) hold exactly 1 training example, and see perfect training performance. In linear methods, you can perfectly fit training data by adding additional polynomial features (for feature x_i, add x^2_i, x^3_i, x^4_i, ....) Performance on unseen data will be terrible. MLP is no different -- adding more free parameters (more flexibility to precisely model the training data) may harm more than help when it comes to unseen data performance, especially when the number of examples it small. Early stopping may help overfitting, as might dropout. The likely reasons that LASSO and GBR performed well is that they're methods that explicit manage overfitting. Perform a grid search on: - the number of hidden nodes in you MLP. - the number of iterations for both, you may find lowering values will improve performance on unseen data. On Tue, Jan 10, 2017 at 4:46 AM, Thomas Evangelidis wrote: > Jacob, > > The features are not 6000. I train 2 MLPRegressors from two types of > data, both refer to the same dataset (35 molecules in total) but each one > contains different type of information. The first data consist of 60 > features. I tried 100 different random states and measured the average |R| > using the leave-20%-out cross-validation. Below are the results from the > first data: > > RandomForestRegressor: |R|= 0.389018243545 +- 0.252891783658 > LASSO: |R|= 0.247411754937 +- 0.232325286471 > GradientBoostingRegressor: |R|= 0.324483769202 +- 0.211778410841 > MLPRegressor: |R|= 0.540528696597 +- 0.255714448793 > > The second type of data consist of 456 features. Below are the results for > these too: > > RandomForestRegressor: |R|= 0.361562548904 +- 0.234872385318 > LASSO: |R|= 3.27752711304e-16 +- 2.60800139195e-16 > GradientBoostingRegressor: |R|= 0.328087138161 +- 0.229588427086 > MLPRegressor: |R|= 0.455473342507 +- 0.24579081197 > > > At the end I want to combine models created from these data types using a > meta-estimator (that was my original question). The combination with the > highest |R| (0.631851796403 +- 0.247911204514) was produced by an SVR > that combined the best MLPRegressor from data type 1 and the best > MLPRegressor from data type2: > > > > > > On 10 January 2017 at 01:36, Jacob Schreiber > wrote: > >> Even with a single layer with 10 neurons you're still trying to train >> over 6000 parameters using ~30 samples. Dropout is a concept common in >> neural networks, but doesn't appear to be in sklearn's implementation of >> MLPs. Early stopping based on validation performance isn't an "extra" step >> for reducing overfitting, it's basically a required step for neural >> networks. It seems like you have a validation sample of ~6 datapoints.. I'm >> still very skeptical of that giving you proper results for a complex model. >> Will this larger dataset be of exactly the same data? Just taking another >> unrelated dataset and showing that a MLP can learn it doesn't mean it will >> work for your specific data. Can you post the actual results from using >> LASSO, RandomForestRegressor, GradientBoostingRegressor, and MLP? >> >> On Mon, Jan 9, 2017 at 4:21 PM, Stuart Reynolds < >> stuart at stuartreynolds.net> wrote: >> >>> If you dont have a large dataset, you can still do leave one out cross >>> validation. >>> >>> On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis