[Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

Stephan Hoyer shoyer at gmail.com
Thu Jan 14 18:37:07 EST 2016


On Thu, Jan 14, 2016 at 2:30 PM, Nathaniel Smith <njs at pobox.com> wrote:

> The reason I didn't suggest dask is that I had the impression that
> dask's model is better suited to bulk/streaming computations with
> vectorized semantics ("do the same thing to lots of data" kinds of
> problems, basically), whereas it sounded like the OP's algorithm
> needed lots of one-off unpredictable random access.
>
> Obviously even if this is true then it's useful to point out both
> because the OP's problem might turn out to be a better fit for dask's
> model than they indicated -- the post is somewhat vague :-).
>
> But, I just wanted to check, is the above a good characterization of
> dask's strengths/applicability?
>

Yes, dask is definitely designed around setting up a large streaming
computation and then executing it all at once.

But it is pretty flexible in terms of what those specific computations are,
and can also work for non-vectorized computation (especially via dask
imperative). It's worth taking a look at dask's collections for a sense of
what it can do here. The recently refreshed docs provide a nice overview:
http://dask.pydata.org/

Cheers,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160114/d0696410/attachment.html>


More information about the NumPy-Discussion mailing list