On Thu, Dec 9, 2021 at 11:47 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
There are tens of concrete examples at the link I gave, and hundreds more you can find easily by searching on Dask Delayed. This feels more like trying to believe a contrary than seeking understanding.
Here's a concrete example that I wrote last summer. I wanted to write a similar program in a bunch of programming languages to learn those languages. From long ago, I had a Python implementation (which I improved quite a lot through the exercise, as well).
https://github.com/DavidMertz/LanguagePractice
What the programs do is identify any duplicate files in a filesystem tree (i.e. perhaps among millions of files, often with different names but same content).
The basic idea is that a hash like SHA1 serves as a fingerprint of contents. However, the main speedup potential is in NOT computing the hash when files are either hardlinks or soft links to the same underlying inode. I/O nowadays is more of a hit than CPU cycles, but the concept applies either way.
Essentially the same technique is used in all the languages. But in the Haskell case, it is NECESSARY to express this as deferred computation. I don't want Python to be like Haskell, which was in most ways the most difficult to work with.
However, it would be interesting and expressive to write a Python version based around Dask Delayed... Or around a generalized "deferred" construct in Python 3.13, maybe. I'm pretty sure it could be shorter and more readable thereby.
The basic and obvious way to write that is a simple dictionary lookup. It's not particularly hard to recognize inode numbers without a deferred/delayed construct. And this is still arguing for their benefit in the wider language, with no indication of how it's better for default arguments. This is a MASSIVE amount of overhead for simple cases of "x=>[]" or similar. ChrisA