On Wed, Apr 22, 2020, 4:24 AM Antoine Pitrou
But, as far as I'm concerned, the number of times where I took advantage of zip()'s current acceptance of heteregenously-sized inputs is extremely small. In most of my uses of zip(), a size difference would have been a logic error that deserves noticing and fixing.
Your experience is very different from mine.
I'm in Antoine's camp on this one. A lot of our work is data analysis, where we get for example simulation results as X, Y, Z components then zip them up into coordinate triples, so any mismatch is a bug. Having zip_equal as a first-class function would replace zip in easily 90% of our use cases, but it needs to be fast as we often do this sort of thing in an inner loop...
+1
I write a lot of standalone data-munging scripts, and expecting zipped inputs to have equal length is a common pattern.
How, for example, to collate lines from 3 potentially large files while ensuring they match in length (without an external dependency)? The best I can think of is rather ugly:
with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
for lineA, lineB, lineC in zip(a, b, c):
do_something_with(lineA, lineB, lineC)
assert next(a, None) is None
assert next(b, None) is None
assert next(c, None) is None
Changing the zip() call to zip(aF, bF, cF, strict=True) would remove the necessity of the asserts. Moreover, the concept of strict zip or zip_equal should be intuitive to beginners, whereas my solution of next() with a sentinel is not. (Oh, an alternative would be checking if a.readline(), b.readline(), and c.readline() are nonempty, but that's not much better and wouldn't generalize to non-file iterators.)
Nathan