[IPython-dev] IPython.parallel slow push
moritz.beber at gmail.com
Tue Aug 12 06:31:15 EDT 2014
On Tue, Aug 12, 2014 at 12:38 AM, Fernando Perez <fperez.net at gmail.com>
> On Mon, Aug 11, 2014 at 6:56 AM, Wes Turner <wes.turner at gmail.com> wrote:
>> This  seems to suggest that anything that isn't a buffer,
>> str/bytes, or numpy array is pickled and copied.
> That is indeed correct.
>> Would it be faster to ETL into something like HDF5 (e.g. w/
>> Pandas/PyTables) and just synchronize the dataset URI?
> IPython.parallel is NOT the right tool to use to move large amounts of
> data around between machines. It's an important problem in
> parallel/distributed computing, but also a very challenging one that is
> beyond our scope and resources.
As I said, I didn't move anything between machines, just locally. Still it
uses ZMQ and I get that IPython is not meant to handle this situation.
Simply using a shelve (relying on pickle here) and loading the contents in
each kernel already improved the time needed a lot.
> When using IPython.parallel, you should think of it as a good way to
> - coordinate computation
> - move code around
> - move *small* data around
> - have interactive control in parallel settings
> But you should have a non-IPython strategy for moving big chunks of data
> around. The right answer to that question will vary from one context to
> another. In some cases a simple NFS mount may be enough, elsewhere
> something like Hadoop FS or Disco FS may work, or a well-sharded database,
> or whatever.
> But it's simply a problem that we consider orthogonal to what
> IPython.parallel can do well.
> Hope this helps,
> Thank you for your input.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-dev