On Sun, Apr 26, 2020 at 9:21 PM David Mertz <mertz@gnosis.cx> wrote:
On Sun, Apr 26, 2020 at 11:56 PM Christopher Barker <pythonchb@gmail.com> wrote:
If I have two or more "sequences" there are basically two cases of that.
so you need to write different code, depending on which case? that seems not very "there's only one way to do it" to me.
This difference is built into the problem itself. There CANNOT be only one way to do these fundamentally different things.
Isn't there? There are many cases where you CANNOT (or don't want to, for performance reasons) "consume" the entirely of the inolut iterators, and many cases where it would be fine to do that. But are there many (any?) cases where you couldn't use the "sentinal approach". To me, having a zip_equal that iterates through the inputs on demand, and checks when one is exhausted, rather than pre-determining the lengths ahead of time will solve almost all (or all? I can't think of an example where it wouldn't) use cases, and it is completely consistent with all the other things that are iterators in Py3 that were sequences in py2: zip, map, dict.items() (and friends), and ... There is a pretty consistent philosophy in py3 that anything that can be an iterator, and be lazy-evaluated is done that way, and for the time when you need an actual sequence, you can wrap list() around it. So I see no downside to having a zip_equal that doesn't pre-compute the lengths, when it could.
With iterators, there is at heart a difference between "sequences that one can (reasonably) concretize" and "sequences that must be lazy." And that difference means that for some versions of a seemingly similar problem it is possible to ask len() before looping through them while for others that is not possible (and hence we may have done some work that we want to "roll-back" in some sense).
Sure: but that is a distinction that is, as far as I know, never made in the standard library with all the "iterator related" code. There are some things that require proper sequences, but as far as I know, nothing that expects a "concretizable" iterator -- and frankly, I'm don't think there is a clear definition of that anyway -- some things clearly aren't, but others it would depend on how big they are, and the memory available to the machine, etc. In fact, the reason we have as many iterator-related tools is exactly so programmers DON'T have to make that decision. Can you think of a single case where a zip_equal() (either pre-exisiting or roll your own) would not work, but the concretizing version would? There is one "downside" to this in that it potentially leaves the iterators passed in in a undetermined state -- partially exhausted, and with a longer one having had one more item removed than was used. But that exists with "zip_shortest" behavior anyway. But it would be a minor reason to do the concertizing approach -- at least then you'd know your iterators were fully exhausted. SIDE NOTE: this is reminding me that there have been calls in the past for an optional __len__ protocol for iterators that are not proper sequences, but DO know their length -- maybe one more place to use that if it existed.
However, the mismatched length feels like such a small concern in what can go wrong.
Agreed -- but I think everyone agrees -- this is not a huge deal (or it would have been done years ago), but it's a nice convenience, and minimally disruptive.
Sure. That's fine. I'm +0 or even higher on adding itertools.zip_strict(). My taste prefers the other style I showed, but as I say, this version is perfectly fine.
If there were a zip_equal() in itertools, would you ever write the code to use zip_longest and check the sentinel? For my part, I wouldn't, and indeed once I had a second need for it, I'd write zip_equal for my own toolbox anyway :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython