On Sat, May 2, 2020 at 2:58 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Sat, May 2, 2020 at 1:19 PM Steven D'Aprano <steve@pearwood.info> wrote:
Rolling your own on top of
zip_longest is easy. It's half a dozen lines. It could be a recipe in
itertools, or a function.
It has taken years for it to be added to more-itertools, suggesting that the real world need for this is small.
"Not every two line function needs to be a builtin" -- this is six lines, not two, which is in the proposal's favour, but the principle still applies. Before this becomes a builtin, there are a number of hurdles to pass:
- Is there a need for it? Granted. - Is it complicated to get right? No.
- Is performance critical enough that it has to be written in C?
Probably not.
No, probably not
I take it back, performance is a problem worth considering. Here is the more-itertools implementation: https://github.com/more-itertools/more-itertools/blob/master/more_itertools/... ``` def zip_equal(*iterables): """``zip`` the input *iterables* together, but throw an ``UnequalIterablesError`` if any of the *iterables* terminate before the others. """ for combo in zip_longest(*iterables, fillvalue=_marker): for val in combo: if val is _marker: raise UnequalIterablesError( "Iterables have different lengths." ) yield combo ``` I didn't think carefully about this implementation and thought that there was only a performance cost in the error case. That's obviously not true - there's an `if` statement executed in Python for every item in every iterable. The overhead is O(len(iterables) * len(iterables[0])). Given that zip is used a lot and most uses of zip should probably be strict, this is a significant problem. Therefore: - Rolling your own on top of zip_longest in six lines is not a solution. - Using more-itertools is not a solution. - It's complicated to get right. - Performance is critical enough to do it in C.