On Sat, Apr 25, 2020 at 7:43 AM Steven D'Aprano <steve@pearwood.info> wrote:
I think that the "correct" (simplest, easiest, most obvious, most
flexible) way is:

    with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
        for lineA, lineB, lineC in zip_longest(a, b, c, fillvalue=''):
            do_something_with(lineA, lineB, lineC)

...
Especially if the files differ in how many newlines they end with. E.g.
file a.txt and c.txt end with a newline, but b.txt ends without one, or
ends with an extra blank line at the end.

File handling code ought to be resilient in the face of such meaningless
differences,

sure. But what difference is "meaningless" depends on the use case. For instance, comments or blank lines in the middle of a file may be a meaningless difference. And you'd want to handle that before zipping anyway. The way I've solved these types of issues in the past is to filter the files first, maybe something like:

    with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
        for lineA, lineB, lineC in zip(filtered(a),
                                       filtered(b),
                                       filtered(c), strict=True):
            do_something_with(lineA, lineB, lineC)
 
> So my argument is that anything you want zip_strict for is better
handled with zip_longest -- including the case of just raising.

That is quite the leap! You make a decent case about handling empty lines in files, but extending that to "anything" is unwarranted.

I honestly do not understand the resistance here. Yes, any change to the standard library should be carefully considered, and any change IS a disruption, and this proposed change may not be worth it. But arguing that it wouldn't ever be useful, I jsut don't get.

Entirely anecdotal evidence here, but I think this is born out by the comments in this thread.

* Many people are surprised when they first discover that zip() stops as the shortest, and silently ignores the rest -- I know I was.
* Many uses (most?) do expect the iterators to be of equal length.
  - The main exception to this may be when one of them is infinite, but how common is that, really? Remember that when zip was first created (py2) it was a list builder, not an iterator, and Python itself was much less iterable-focused.
* However, many uses work fine without any length-checking -- that is often taken car of elsewhere in the code -- this is kinda-sorta analogous to a lack of type checking, sure you COULD get errors, but you usually don't.

We've done fine for years with zip's current behavior, but that doesn't mean it couldn't be a little better and safer for a lot of use cases, and a number of folks on this thread have said that they would use it.

So: if this were added, it would get some use. How much? hard to know. Is it critically important? absolute not. But it's fully backward compatible and not a language change, the barrier to entry is not all that high.

However, I agree with (I think Brandt) in that the lack of a critical need means that a zip_strict() in itertools would get a LOT less use than a flag on zip itself -- so I advocate for that. If folks think extending zip() is not worth it, then I don't think it would be worth bothering with adding a sip_strict to itertools at all.

-CHB

--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython