[scikit-learn] Any plans on generalizing Pipeline and transformers?

Manuel Castejón Limas manuel.castejon at gmail.com
Wed Jan 3 19:34:58 EST 2018


I've read about Dask and it is a tool I want to have in my belt especially
for using the SGE connection in order to run GridSearchCV on the
supercomputer center I have access to. Should it work as promised it will
be one of my favs.

As far as my toy example I keep more limited goals with this graph: I am
not currently interested in parallelizing each step as I guess that
parallelizing each graph fit through gridSearchCV will be more similar to
what I need.

I keep working on a proof concept. You can have a look at:

https://github.com/mcasl/PAELLA/blob/master/pipeGraph.py

along with a few unitary tests:
https://github.com/mcasl/PAELLA/blob/master/tests/test_pipeGraph.py

As of today, I have an iterable graph of steps that can be fitted/run
depending on their role (some can be disable during run while active during
fit or vice-versa). I still have to play a bit with injecting different
parameters to make it compatible with gridSearchCV and learn a bit about
the memory options in order to cache results.

Any comments highly appreciated, truly!
Manolo




2017-12-30 15:34 GMT+01:00 Frédéric Bastien <frederic.bastien at gmail.com>:

> This start to look as the dask project. Do you know it?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180104/47515581/attachment.html>


More information about the scikit-learn mailing list