On Thu., 21 May 2020, 4:09 am Jim J. Jewett, <jimjjewett@gmail.com> wrote:
David Mertz wrote:

> Fwiw, I don't think it changes my order, but 'strict' is a better word than
> 'equal' in all those places. I'd subtract 0.1 from each of those votes if
> they used "equal".

I would say that 'equal' is worse than 'strict'. but 'strict' is also wrong. 

Zipping to a potentially infinite sequence -- like a manual enumerate --
isn't wrong.  It may be the less common case, but it isn't wrong.  Using
'strict' implies that there is something sloppy about the data in, for
example, cases like Stephen J. Turnbull's lagged time series.

Unfortunately, the best I can come up with is 'same_length', or possibly
'equal_len' or 'equal_length'.  While those are better semantically, they
are also slightly too long or awkward.  I would personally still consider
'same_length' the least bad option.

Reading this thread and the current PEP, the main question I had was whether it might be better to flip the sense of the flag and call it "truncate".

So the status quo would be "truncate=True", while the ValueError could be requested by passing an explicit "truncate=False".

Draft documentation paragraph:

zip() can be used to combine iterables of different lengths, including combining finite iterables with infinite iterators. By default, the output iterator is implicitly truncated to produce the same number of items as the shortest input iterable. Setting *truncate* to false disables this implicit truncation and raises ValueError instead. Note that if this ValueError is raised an additional item will have been consumed from any iterators listed before the shortest iterator (or from the second listed iterator if the first iterator is the shortest one).

To pad shorter input iterables rather than truncating the output or raising ValueError, see itertools.zip_longest.

The conceptual idea here is that the "truncate" flag name would technically be a shorter mnemonic for "truncate_silently", so clearing it gives you an exception rather enabling padding behaviour.

Flipping the sense of the flag also means that "truncate=True" will appear in IDE tooltips as part of the function signature, providing significantly more information than "strict=False" would.

That improved self-documentation then becomes what I would consider the strongest argument in favour of the flag-based approach: providing more information up-front to users regarding the actual behaviour of the builtin, rather than having them incorrectly assume that mismatched input iterator lengths will raise an exception.

Side note: this idea pairs nicely with the "zip(itr, itr, ir)" idiom for non-overlapping data windows, as it makes it straightforward to request an exception if the last data tuple has values missing (without the flag, the idiom silently discards incomplete trailing data).


P.S. I had the opportunity to read the thread from beginning to end after belatedly catching some of the messages out of context, and FWIW, I started out assuming I would strongly favour the itertools function option, and surprised myself by favouring the flag option (albeit inverted) by the time I reached the end.