On Wed, Dec 8, 2021, 5:55
PM Rob Cliffe via Python-ideas
But AIUI (i.e. practically not at all) Dask is about
parallel computing, which is not the same thing as
deferred evaluation, though doubtless they overlap.
Again AIUI, parallel computing is mainly useful when you
have multiple cores or multiple computers.
Much of Dask is about parallelism. But Dask
Delayed really isn't. I mean, yes it's a good adjunct to
actual parallelism, but much of the benefit is independent.
In particular, in Dask delayed—much as in a
thoroughly lazy language like Haskell—you can express a graph
of interrelated computations that you might POTENTIALLY
perform.
There are many times when expressing those
dependencies is useful, even before you know which, if any, of
them will actually need to be performed. The site I linked as
many more fleshed out examples, but suppose I have this
dataflow relationship:
A -> B -> C -> D -> E
Each of those letters name some expensive
computation (or maybe expensive I/O, or both).
In a particular run of our program, we might
determine that we need the data created by B. But in that
particular run, we never wind up using C, D or E. Of course, a
different run, based on different conditions, will actually
need E.
In this simplest possible DAG, I've deliberately
avoided any possible parallelism. Every step entirely depends
on the one before it. But delayed compution can still be
useful. Of course, when the DAG has branches, often operating
on branches can often be usefully parallelized (but that's
still not required for laziness to remain useful.