[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

June 1, 2020

      On Thu., 21 May 2020, 4:09 am Jim J. Jewett, <jimjjewett@gmail.com> wrote:
...
David Mertz wrote:
...
Fwiw, I don't think it changes my order, but 'strict' is a better word
than
'equal' in all those places. I'd subtract 0.1 from each of those votes if
they used "equal".
I would say that 'equal' is worse than 'strict'. but 'strict' is also
wrong.
Zipping to a potentially infinite sequence -- like a manual enumerate --
isn't wrong.  It may be the less common case, but it isn't wrong.  Using
'strict' implies that there is something sloppy about the data in, for
example, cases like Stephen J. Turnbull's lagged time series.
Unfortunately, the best I can come up with is 'same_length', or possibly
'equal_len' or 'equal_length'.  While those are better semantically, they
are also slightly too long or awkward.  I would personally still consider
'same_length' the least bad option.
Reading this thread and the current PEP, the main question I had was
whether it might be better to flip the sense of the flag and call it
"truncate".

So the status quo would be "truncate=True", while the ValueError could be
requested by passing an explicit "truncate=False".

Draft documentation paragraph:

======
zip() can be used to combine iterables of different lengths, including
combining finite iterables with infinite iterators. By default, the output
iterator is implicitly truncated to produce the same number of items as the
shortest input iterable. Setting *truncate* to false disables this implicit
truncation and raises ValueError instead. Note that if this ValueError is
raised an additional item will have been consumed from any iterators listed
before the shortest iterator (or from the second listed iterator if the
first iterator is the shortest one).

To pad shorter input iterables rather than truncating the output or raising
ValueError, see itertools.zip_longest.
======

The conceptual idea here is that the "truncate" flag name would technically
be a shorter mnemonic for "truncate_silently", so clearing it gives you an
exception rather enabling padding behaviour.

Flipping the sense of the flag also means that "truncate=True" will appear
in IDE tooltips as part of the function signature, providing significantly
more information than "strict=False" would.

That improved self-documentation then becomes what I would consider the
strongest argument in favour of the flag-based approach: providing more
information up-front to users regarding the actual behaviour of the
builtin, rather than having them incorrectly assume that mismatched input
iterator lengths will raise an exception.

Side note: this idea pairs nicely with the "zip(itr, itr, ir)" idiom for
non-overlapping data windows, as it makes it straightforward to request an
exception if the last data tuple has values missing (without the flag, the
idiom silently discards incomplete trailing data).

Cheers,
Nick.

P.S. I had the opportunity to read the thread from beginning to end after
belatedly catching some of the messages out of context, and FWIW, I started
out assuming I would strongly favour the itertools function option, and
surprised myself by favouring the flag option (albeit inverted) by the time
I reached the end.
...

[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

Nick Coghlan