Let me try to explain why I believe that people who think they want zip_strict() actually want zip_longest().  I've already mentioned that I myself usually want what zip() does not (i.e. zip_shortest()) ... but indeed not always.

If I have two or more "sequences" there are basically two cases of that.

(1) The sequences are something like "options" where I expect a small number of them (say 5, or 50).  If that is the case, code like this is perfectly fine:

stuff1, stuff2 = map(list, (stuff1, stuff2))  # concretize iterators
if len(stuff1) == len(stuff2):
    for pair in zip(stuff1, stuff2)):
        process(pair)
else:
   raise UnequalLengthErrror("uh oh")

(2) The sequences are either infinite or very large.  I.e. they are "data", perhaps even streaming data that only arrives over time into the iterator from some external source.  If this is the case, obviously we cannot concretize them.  So here we either use the current tool:

for pair in zip_longest(stuff1, stuff2, fillvalue=_sentinel):
    if _sentinel in pair:
        raise UnequalLengthError("uh oh")
    process(pair)

Or alternately, we have a new function/mode that instead formulates this as:

try:
    for pair in zip_strict(stuff1, stuff2):
        process(pair)
except ZipLengthError:
    raise UnequalLengthError("uh oh")

The hypothetical new style is fine.  To me it looks slightly less good, but the difference is minimal.  It almost feels like the proponents of the new mode/function are hoping to avoid the processing that might need to be "rolled back" in some manner if there is a synchronization problem.  But that simply is not an option.  If we have a billion events, or indefinitely many events that arrive over time, we simply cannot know before we get to the end that syncrhonization messed up.  I mean, sure, if some characteristic of the intermediate data can indicate the mismatch, that's great... but it's not affected by which style is used, it's a separate test.

Approach (1) is nice where available because it avoids processing altogether.  But it is only possible for "small data" (and "ready data") no matter what.



On Sun, Apr 26, 2020 at 12:34 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, Apr 25, 2020 at 10:50 AM Kirill Balunov <kirillbalunov@gmail.com> wrote:
  ...the mode switching approach in the current situation looks reasonable, because the question is how boundary conditions should be treated. I still prefer three cases switch like `zip(..., mode=('equal' | 'shortest' | 'longest'))`

I like this -- it certainly makes a lot more sense than having zip(), zip(...,strict=True), and zip_longest()

So I'm proposing that we have three options on the table:

zip(..., mode=('equal' | 'shortest' | 'longest'))

or

zip()
zip_longest()
zip(equal)

or, of course, keep it as it is.





... but also ok with `strict=True` variant.

Chris Angelico wrote:
Separate functions mean you can easily and simply make a per-module decision:

from itertools import zip_strict as zip

Tada! Now this module treats zip as strict mode.

this is a nifty feature of multiple functions in general, but I'm having a really hard time coming up with a use case for these particular functions: you're using zip() multiple times in one module, and you want them all to be the same "version", but yiou want to be able to easily change that version on a module-wide bases?

As for the string methods examples: one big difference is that the string methods are all in the same namespace. This is different because zip() is a built in, but zip_longest() and zip_equal() would not be. I don't think anyone is suggesting adding both of those to builtins. So adding a mode switch is the only way to "unify" them -- maybe not a huge deal, but I think a big enough deal that zip_equal wouldn't see much use.

>and changing map and friends to iterators is a big part of why you can write all kinds of things naturally in Python 3.9 that were clumsy, complicated, or even impossible.

Sure, and I think we're all happy about that, but I also find that we've lost a bit of the nice "sequence-oriented" behavior too. Not sure that's relevant to this discussion though. Bu tit is on one way:Back in 1.5 days, you would always use zip() on sequences, so checking their length was trivial, if you really wanted to do that -- but while checking that your iterators were in fact that same length is possible, it's pretty klunky, and certainly not newbie-friendly.

I've lot track of who said it, but I think someone proposed that most people really want zip_longest most often. (sorry if I'm misinterpreting that). I think this is kinda true, in the sense that if you only had one, than zip_longest would be able to conver teh most use-cases (you can build a zip_shortest out of zip_longest, but not the other way around) but particularly as zip_longest() is going to fail with infinite iterators, it can't be the only option after all.

One other comment about the modes vs multiple functions:

It makes a difference with implementation -- with multiple functions, you have to re-implement the core functionality three times (DRY) -- or have a hidden common function underneath -- that seems like code-smell to me.


-CHB







 
--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython


--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.