[Python-ideas] Re: zip(x, y, z, strict=True)

27 Apr 2020

      On Sun, Apr 26, 2020 at 9:21 PM David Mertz  wrote:
...
On Sun, Apr 26, 2020 at 11:56 PM Christopher Barker 
wrote:
...
...
If I have two or more "sequences" there are basically two cases of that.
so you need to write different code, depending on which case? that seems
not very "there's only one way to do it" to me.
This difference is built into the problem itself.  There CANNOT be only
one way to do these fundamentally different things.
Isn't there? There are many cases where you CANNOT (or don't want to, for
performance reasons) "consume" the entirely of the inolut iterators, and
many cases where it would be fine to do that. But are there many (any?)
cases where you couldn't use the "sentinal approach".

To me, having a zip_equal that iterates through the inputs on demand, and
checks when one is exhausted, rather than pre-determining the lengths ahead
of time will solve almost all (or all? I can't think of an example where it
wouldn't) use cases, and it is completely consistent with all the other
things that are iterators in Py3 that were sequences in py2: zip, map,
dict.items() (and friends), and ...

There is a pretty consistent philosophy in py3 that anything that can be an
iterator, and be lazy-evaluated is done that way, and for the time when you
need an actual sequence, you can wrap list() around it.

So I see no downside to having a zip_equal that doesn't pre-compute the
lengths, when it could.
...
With iterators, there is at heart a difference between "sequences that one
can (reasonably) concretize" and "sequences that must be lazy."  And that
difference means that for some versions of a seemingly similar problem it
is possible to ask len() before looping through them while for others that
is not possible (and hence we may have done some work that we want to
"roll-back" in some sense).
Sure: but that is a distinction that is, as far as I know, never made in
the standard library with all the "iterator related" code. There are some
things that require proper sequences, but as far as I know, nothing that
expects a "concretizable" iterator -- and frankly, I'm don't think there is
a clear definition of that anyway -- some things clearly aren't, but others
it would depend on how big they are, and the memory available to the
machine, etc. In fact, the reason we have as many iterator-related tools is
exactly so programmers DON'T have to make that decision.

Can you think of a single case where a zip_equal() (either pre-exisiting or
roll your own) would not work, but the concretizing version would?

There is one "downside" to this in that it potentially leaves the iterators
passed in in a undetermined state -- partially exhausted, and with a longer
one having had one more item removed than was used. But that exists with
"zip_shortest" behavior anyway. But it would be a minor reason to do the
concertizing approach -- at least then you'd know your iterators were fully
exhausted.

SIDE NOTE: this is reminding me that there have been calls in the past for
an optional __len__ protocol for iterators that are not proper sequences,
but DO know their length -- maybe one more place to use that if it existed.
...
However, the mismatched length feels like such a small concern in what
can go wrong.
Agreed -- but I think everyone agrees -- this is not a huge deal (or it
would have been done years ago), but it's a nice convenience, and minimally
disruptive.
...
Sure.  That's fine. I'm +0 or even higher on adding
itertools.zip_strict().  My taste prefers the other style I showed, but as
I say, this version is perfectly fine.
If there were a zip_equal() in itertools, would you ever write the code to
use zip_longest and check the sentinel? For my part, I wouldn't, and indeed
once I had a second need for it, I'd write zip_equal for my own toolbox
anyway :-)

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython