[scikit-learn] Scaling model selection on a cluster
Vlad Ionescu
ionescu.vlad1 at gmail.com
Sun Aug 7 06:51:32 EDT 2016
Thanks, that looks interesting. I've looked into dask-learn's grid search (
https://github.com/mrocklin/dask-learn/blob/master/grid_search.py) but it
seems not to make use of the n_jobs parameter. Will this work in a
distributed fashion? The link you gave seemed to focus more on optimizing
the grid search by eliminating duplicate work rather than by distributing
it on more machines (I am actually using a random search, so I'm not sure
those optimizations apply to my use case anyway).
Dask itself seems like it might work, although it seems to require running
manually on each node. Will look into it some more.
On Sun, Aug 7, 2016 at 12:06 PM federico vaggi <vaggi.federico at gmail.com>
wrote:
> This might be interesting to you:
>
> http://blaze.pydata.org/blog/2015/10/19/dask-learn/
>
>
> On Sun, 7 Aug 2016 at 10:42 Vlad Ionescu <ionescu.vlad1 at gmail.com> wrote:
>
>> Hello,
>>
>> I am interested in scaling grid searches on an HPC LSF cluster with about
>> 60 nodes, each with 20 cores. I thought i could just set n_jobs=1000 then
>> submit a job with bsub -n 1000, but then I dug deeper and understood that
>> the underlying joblib used by scikit-learn will create all of those jobs on
>> a single node, resulting in no performance benefits. So I am stuck using a
>> single node.
>>
>> I've read a lengthy discussion some time ago about adding something like
>> this in scikit-learn:
>> https://sourceforge.net/p/scikit-learn/mailman/scikit-learn-general/thread/4F26C3CB.8070603@ais.uni-bonn.de/
>>
>>
>> However, it hasn't materialized in any way, as far as I can tell.
>>
>> Do you know of any way to do this, or any modern cluster computing
>> libraries for python that might help me write something myself (I found a
>> lot, but it's hard to tell what's considered good or even still under
>> development)?
>>
>> Also, are there still plans to implement this in scikit-learn? You seemed
>> to like the idea back then.
>>
> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160807/1188bc67/attachment.html>
More information about the scikit-learn
mailing list