I have no objection to adding a zip_strict() or zip_exact() to itertools. I am used to the current behavior, and am apparently in minority in not usually assuming common length iterators. Say +0 on a new function.

But I'm definitely -1 on adding a mode switch to the built-in. This is not the way Python is usually done. zip_longest() is a clear example, but so is the recent cut_suffix (or whatever final spelling was chosen). Some folks wanted a mode switch on .rstrip(), and that was appropriately rejected. 

If zip_strict() is genuinely what you want to do, an import from stdlib is not much effort to get it. My belief is that usually people who think they want this actually want zip_longest(), but that's up to them.

On Sat, Apr 25, 2020, 12:43 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, Apr 25, 2020 at 7:43 AM Steven D'Aprano <steve@pearwood.info> wrote:
I think that the "correct" (simplest, easiest, most obvious, most
flexible) way is:

    with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
        for lineA, lineB, lineC in zip_longest(a, b, c, fillvalue=''):
            do_something_with(lineA, lineB, lineC)

...
Especially if the files differ in how many newlines they end with. E.g.
file a.txt and c.txt end with a newline, but b.txt ends without one, or
ends with an extra blank line at the end.

File handling code ought to be resilient in the face of such meaningless
differences,

sure. But what difference is "meaningless" depends on the use case. For instance, comments or blank lines in the middle of a file may be a meaningless difference. And you'd want to handle that before zipping anyway. The way I've solved these types of issues in the past is to filter the files first, maybe something like:

    with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
        for lineA, lineB, lineC in zip(filtered(a),
                                       filtered(b),
                                       filtered(c), strict=True):
            do_something_with(lineA, lineB, lineC)
 
> So my argument is that anything you want zip_strict for is better
handled with zip_longest -- including the case of just raising.

That is quite the leap! You make a decent case about handling empty lines in files, but extending that to "anything" is unwarranted.

I honestly do not understand the resistance here. Yes, any change to the standard library should be carefully considered, and any change IS a disruption, and this proposed change may not be worth it. But arguing that it wouldn't ever be useful, I jsut don't get.

Entirely anecdotal evidence here, but I think this is born out by the comments in this thread.

* Many people are surprised when they first discover that zip() stops as the shortest, and silently ignores the rest -- I know I was.
* Many uses (most?) do expect the iterators to be of equal length.
  - The main exception to this may be when one of them is infinite, but how common is that, really? Remember that when zip was first created (py2) it was a list builder, not an iterator, and Python itself was much less iterable-focused.
* However, many uses work fine without any length-checking -- that is often taken car of elsewhere in the code -- this is kinda-sorta analogous to a lack of type checking, sure you COULD get errors, but you usually don't.

We've done fine for years with zip's current behavior, but that doesn't mean it couldn't be a little better and safer for a lot of use cases, and a number of folks on this thread have said that they would use it.

So: if this were added, it would get some use. How much? hard to know. Is it critically important? absolute not. But it's fully backward compatible and not a language change, the barrier to entry is not all that high.

However, I agree with (I think Brandt) in that the lack of a critical need means that a zip_strict() in itertools would get a LOT less use than a flag on zip itself -- so I advocate for that. If folks think extending zip() is not worth it, then I don't think it would be worth bothering with adding a sip_strict to itertools at all.

-CHB

--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2X74JUYM3OF5LGEIWRMS4HTWPTKHX53D/
Code of Conduct: http://python.org/psf/codeofconduct/