[scikit-learn] meta-estimator for multiple MLPRegressor

Jacob Schreiber jmschreiber91 at gmail.com
Sat Jan 7 19:40:41 EST 2017


This is an aside to what your original question was, but as someone who has
dealt with similar data in bioinformatics (gene expression, specifically) I
think you should tread -very- carefully if you have such a small sample set
and more dimensions than features. MLPs are already prone to overfit and
both of those factors would make me inherently suspicious of the results.
This sounds like an easy way to trick yourself into thinking you are making
good predictions. Perhaps consider LASSO?

Back to the original question, it is true that using a SVR in a stacking
technique would add more parameters to your model, but it is likely an
insignificant amount when compared to the MLPs themselves. Alternatively
you may consider using LASSO using all of the MLPs (not just the top 10%)
so you can learn which ones yield useful features for a meta-estimator
instead of just selecting the top 10%.

On Sat, Jan 7, 2017 at 4:01 PM, Thomas Evangelidis <tevang3 at gmail.com>
wrote:

>
>
> On 8 January 2017 at 00:04, Jacob Schreiber <jmschreiber91 at gmail.com>
> wrote:
>
>> If you have such a small number of observations (with a much higher
>> feature space) then why do you think you can accurately train not just a
>> single MLP, but an ensemble of them without overfitting dramatically?
>>
>>
>>
> ​Because the observations in the data set don't differ much between them​.
> To be more specific, the data set consists of a congeneric series of
> organic molecules and the ebservation is their binding strength to a target
> protein. The idea was to train predictors that can predict the binding
> strenght of new molecules that belong to the same congeneric series.
> Therefore special care is taken to apply the predictors to the right domain
> of applicability. According to the literature, the same strategy has been
> followed in the past several times. The novelty of my approach stems from
> other factors that are irrelevant to this thread.
>
>
> --
>
> ======================================================================
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tevang at pharm.uoa.gr
>
>           tevang3 at gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170107/ae2f7919/attachment.html>


More information about the scikit-learn mailing list