[Python-ideas] Re: zip(x, y, z, strict=True)

9 May 2020

      On May 5, 2020, at 12:50, Christopher Barker  wrote:
...
Another key point is that if you want zip_longest() functionality, you simply can not get it with the builtin zip() -- you are forced to look elsewhere. Whereas most code that might want "strict" behavior will still work, albeit less safely, with the builtin.
I think this is a key point, but I think you’ve got it backward.

You _can_ build zip_longest with zip, and before 2.6, people _did_. (Well, they built izip_longest with izip.) I’ve still got my version in an old toolbox. You chain a repeat(None) onto each iterable, izip, and you get an infinite iterator that you have to read until all(is None). You can just takewhile that into exactly the same thing as izip_longest, but unfortunately that’s a lot slower than filtering when you iterate, so I had both _longest and _infinite variants, and I think I used the latter more even though it was usually less convenient. That sounds like a silly way to do it, and it’s certainly easier to get subtly wrong than just writing a generator function like the “as if” code in the (i)zip_longest docs, but a comment in my code assures me that this is almost 4x as fast, and half the speed of a custom C implementation, so I’m pretty sure that’s why I did it. And I doubt I’m the only person who solved it that way. In fact, I’ll bet I copied it from an ActiveState recipe or a colleague or an open source project.

So, most likely, izip_longest wasn’t added because you can’t build it on top of izip, but because building it on top of izip is easy to get subtly wrong (especially if you need it to be fast—or don’t need it to be fast but micro optimize it anyway, for that matter), and often people punt and do something clunkier (use _infinite instead of _longest and make the final for loop more complicated).

Which is actually a pretty good parallel for the current proposal. You can write your own zip_strict on top of zip, and at least a few people do—but, as people have shown in this thread, the obvious solution is too slow, the obvious fast solution is very easy to get subtly wrong, and often people punt and do something clunkier (listify and compare len).

That’s why I’m +1 on this proposal in some form. Assuming zip_strict would be useful at least as often as zip_longest (and I’ve been sold on that part, and I think most people on all sides of this discussion agree?), it calls out for a good official solution. The fact that the ecosystem is different nowadays (pip install more-itertools or copying off StackOverflow is a lot simpler, and more common, than finding a recipe on ActiveState) does make it a little less compelling, but at most that means the official solution should be a docs link to more-itertools, still not that we should do nothing.

But that’s also part of the reason I’m -1 on it being a flag. Just like zip_longest, it’s a different function, one you shouldn’t think of as being built on zip even if it could be. Maybe strict really is needed so much more often than longest that “import itertools” is too onerous, but if that’s really true, that different function should be another builtin. I think nobody is arguing for that, because it’s just obvious that it isn’t needed enough to reach the high bar of adding another function to builtins. But that means it belongs in itertools.

Trying to make it a flag (which will always be passed a constant value) is a clever way to try to get the best of both worlds—and so is the chain.from_iterable style. But if either of those really did get the best of both worlds and the problems of neither, it would be used all over the place, rather than as sparingly as possible. And of course it doesn’t get the best of both worlds. A flag is hiding code as data, and it looks misleadingly like the much more common uses of flags where you actually do often set the flag with a runtime value. It’s harder to type (and autocomplete makes the difference worse, not better). It’s a tiny bit harder to read, because you’re adding as much meaningless boilerplate (True) as important information (strict). It’s increasing the amount of stuff to learn in builtins just as much as another function would. And so on. So it’s only worth doing for really special cases, like open.

[Python-ideas] Re: zip(x, y, z, strict=True)

Andrew Barnert