On Thu, Jan 14, 2016 at 2:30 PM, Nathaniel Smith <njs@pobox.com> wrote:
The reason I didn't suggest dask is that I had the impression that
dask's model is better suited to bulk/streaming computations with
vectorized semantics ("do the same thing to lots of data" kinds of
problems, basically), whereas it sounded like the OP's algorithm
needed lots of one-off unpredictable random access.

Obviously even if this is true then it's useful to point out both
because the OP's problem might turn out to be a better fit for dask's
model than they indicated -- the post is somewhat vague :-).

But, I just wanted to check, is the above a good characterization of
dask's strengths/applicability?

Yes, dask is definitely designed around setting up a large streaming computation and then executing it all at once.

But it is pretty flexible in terms of what those specific computations are, and can also work for non-vectorized computation (especially via dask imperative). It's worth taking a look at dask's collections for a sense of what it can do here. The recently refreshed docs provide a nice overview:
http://dask.pydata.org/

Cheers,
Stephan