[IPython-dev] IPython.parallel slow push
Fernando Perez
fperez.net at gmail.com
Mon Aug 11 18:38:09 EDT 2014
On Mon, Aug 11, 2014 at 6:56 AM, Wes Turner <wes.turner at gmail.com> wrote:
> This [2] seems to suggest that anything that isn't a buffer,
> str/bytes, or numpy array is pickled and copied.
>
That is indeed correct.
> Would it be faster to ETL into something like HDF5 (e.g. w/
> Pandas/PyTables) and just synchronize the dataset URI?
>
Absolutely.
IPython.parallel is NOT the right tool to use to move large amounts of data
around between machines. It's an important problem in parallel/distributed
computing, but also a very challenging one that is beyond our scope and
resources.
When using IPython.parallel, you should think of it as a good way to
- coordinate computation
- move code around
- move *small* data around
- have interactive control in parallel settings
But you should have a non-IPython strategy for moving big chunks of data
around. The right answer to that question will vary from one context to
another. In some cases a simple NFS mount may be enough, elsewhere
something like Hadoop FS or Disco FS may work, or a well-sharded database,
or whatever.
But it's simply a problem that we consider orthogonal to what
IPython.parallel can do well.
Hope this helps,
f
--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140811/390234bc/attachment.html>
More information about the IPython-dev
mailing list