On Apr 22, 2020, at 01:52, Serhiy Storchaka <storchaka@gmail.com> wrote:
22.04.20 11:20, Antoine Pitrou пише:
Ideally, that's what it would do. Whether it's desirable to transition to that behaviour is an open question. But, as far as I'm concerned, the number of times where I took advantage of zip()'s current acceptance of heteregenously-sized inputs is extremely small. In most of my uses of zip(), a size difference would have been a logic error that deserves noticing and fixing.
I concur with Antoine. Ideally we should have several functions: zip_shortest(), zip_equal(), zip_longest(). In most cases (80% or 90% or more) they are equivalent, because input iterators has the same length, but it is safer to use zip_equal() to catch bugs. In other cases you would use zip_shortest() or zip_longest(). And it would be natural to rename the most popular variant to just zip().
Now it is a breaking change. We had a chance to do it in 3.0, when other breaking change was performed in zip(). I do not know if it is worth to do now. But when we plan any changes in zip() we should take into account possible future changes and make them simpler, not harder.
If that is your long-term goal, I think you could do it in three steps. First, just add itertools.zip_equal. Ideally the docs should replace the usual “Added in 3.9” with something like “Added in 3.9; if you need the same function in earlier versions see more-itertools” (linked to the more-itertools blurb at the top of the page). It seems like there’s a lot of support for this step even from people who don’t want your big goal. Second, add itertools.zip_shortest. And change zip’s docs to say that it’s the same as zip_shortest and mention the other two choices, and maybe even to try to nudge people to explicitly decide which of the three they want. And find some places in the tutorial that use zip and change them to use zip_equal and zip_shortest as appropriate. I think that gets you about as much as you can get without backward compatibility issues, and it also gets you closer to being able to deprecate zip or change it to alias zip_equal, rather than making it harder. Third, do the deprecation. By that point, everyone maintaining existing code will have an easy way to defensively prepare for it: as long as they can require 3.10+ or more-itertools, they can just change all uses of zip to zip_shortest and they’re done. Still not painless, but about as painless as a backward compatibility break could ever be. And of course after the first two steps you can proselytize for the next one. If you can convince lots of people that they should care about the choice more often and get them using the explicit functions, it’ll be a lot harder to argue that everyone is happy with today’s behavior.