There are tens of concrete examples at the link I gave, and hundreds more you can find easily by searching on Dask Delayed. This feels more like trying to believe a contrary than seeking understanding.

Here's a concrete example that I wrote last summer. I wanted to write a similar program in a bunch of programming languages to learn those languages. From long ago, I had a Python implementation (which I improved quite a lot through the exercise, as well).

https://github.com/DavidMertz/LanguagePractice

What the programs do is identify any duplicate files in a filesystem tree (i.e. perhaps among millions of files, often with different names but same content).

The basic idea is that a hash like SHA1 serves as a fingerprint of contents. However, the main speedup potential is in NOT computing the hash when files are either hardlinks or soft links to the same underlying inode. I/O nowadays is more of a hit than CPU cycles, but the concept applies either way.

Essentially the same technique is used in all the languages. But in the Haskell case, it is NECESSARY to express this as deferred computation. I don't want Python to be like Haskell, which was in most ways the most difficult to work with.

However, it would be interesting and expressive to write a Python version based around Dask Delayed... Or around a generalized "deferred" construct in Python 3.13, maybe. I'm pretty sure it could be shorter and more readable thereby.

On Wed, Dec 8, 2021, 6:28 PM Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:


On 08/12/2021 23:09, David Mertz, Ph.D. wrote:
On Wed, Dec 8, 2021, 5:55 PM Rob Cliffe via Python-ideas 
But AIUI (i.e. practically not at all) Dask is about parallel computing, which is not the same thing as deferred evaluation, though doubtless they overlap.  Again AIUI, parallel computing is mainly useful when you have multiple cores or multiple computers.

Much of Dask is about parallelism. But Dask Delayed really isn't. I mean, yes it's a good adjunct to actual parallelism, but much of the benefit is independent.

In particular, in Dask delayed—much as in a thoroughly lazy language like Haskell—you can express a graph of interrelated computations that you might POTENTIALLY perform.

There are many times when expressing those dependencies is useful, even before you know which, if any, of them will actually need to be performed. The site I linked as many more fleshed out examples, but suppose I have this dataflow relationship:

  A -> B -> C -> D -> E

Each of those letters name some expensive computation (or maybe expensive I/O, or both).

In a particular run of our program, we might determine that we need the data created by B. But in that particular run, we never wind up using C, D or E. Of course, a different run, based on different conditions, will actually need E.

In this simplest possible DAG, I've deliberately avoided any possible parallelism. Every step entirely depends on the one before it. But delayed compution can still be useful. Of course, when the DAG has branches, often operating on branches can often be usefully parallelized (but that's still not required for laziness to remain useful.

This is all abstract.  You give no clue to what your application is or what it is meant to do.  Please, may I refer you to my previous post:

    "Can anyone give examples (in Python pseudo-code perhaps) showing how *deferred evaluation* would be useful for a concrete task?  (Solving an equation.  Drawing a graph.  Analysing a document.  Manufacturing a widget.  Planning a journey.  Firing a missile.  Anything!  You name it.)"

David?  Anybody??

Best wishes
Rob Cliffe
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BKLACJG4ELMSWP73T76IHN2J4RTK66W4/
Code of Conduct: http://python.org/psf/codeofconduct/