On Wed, Dec 8, 2021, 5:55 PM Rob Cliffe via Python-ideas
But AIUI (i.e. practically not at all) Dask is about parallel
computing, which is not the same thing as deferred evaluation,
though doubtless they overlap. Again AIUI, parallel computing is
mainly useful when you have multiple cores or multiple computers.
Much of Dask is about parallelism. But Dask Delayed really isn't. I mean, yes it's a good adjunct to actual parallelism, but much of the benefit is independent.
In particular, in Dask delayed—much as in a thoroughly lazy language like Haskell—you can express a graph of interrelated computations that you might POTENTIALLY perform.
There are many times when expressing those dependencies is useful, even before you know which, if any, of them will actually need to be performed. The site I linked as many more fleshed out examples, but suppose I have this dataflow relationship:
A -> B -> C -> D -> E
Each of those letters name some expensive computation (or maybe expensive I/O, or both).
In a particular run of our program, we might determine that we need the data created by B. But in that particular run, we never wind up using C, D or E. Of course, a different run, based on different conditions, will actually need E.
In this simplest possible DAG, I've deliberately avoided any possible parallelism. Every step entirely depends on the one before it. But delayed compution can still be useful. Of course, when the DAG has branches, often operating on branches can often be usefully parallelized (but that's still not required for laziness to remain useful.