[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

May 17, 2020

      Gregory P. Smith writes:
...
Agreed.  The best way to reduce accidental incorrect use of the
builtin is to make the builtin capable of doing what a people want
directly without having to go discover something in a module
somewhere.
Executive summary:

My argument (and one of Steven d'Aprano's) against a "strict" mode to
zip is precisely that it's *extremely* likely that if I use a facility
that zips together things I provide, the last thing I want it is for
it to choose "strict" for me, because that *would likely be
incorrect*.  I do not want people using strict *for any facility I
might use* "because it's there."  I'm not saying strict mode is
useless.  I am saying the "encourage use by making it easier to use"
argument cuts both ways: it can create problems as well as solve them.

A couple of concrete examples:

1. In activities like constructing data arrays, which we expect to be
rectangular, I'm still likely to use sequences of unequal length,
including infinite sequences.  As an economist, I often use lagged
data, which can easily be constructed for an equation like
y[t] = a + b x[t] + c x[t-1] with

    zip(y[1:], const(), x[1:], x[0:])

where

    def const():
        while True:
            yield 1

(Here I'm using zip() as a proxy for somebody's generic facility such
as a function to compute OLS estimates given a sequence of data
series.  Obviously for zip itself, I would just not use strict mode.)

Note that y[0], not y[-1], needs to be left out.  This is the critical
point that I need to concentrate on when constructing this data frame.
If I have to "even out" the columns, though, I need *also* to think
about the lengths, a distraction which for me makes this more
bug-prone.  Ie, I might accidentally write

    zip(y[:-1], const(len(x) - 1), x[:-1], x[1:])

where

    def const(n):
        return (1 for _ in range(n))

which is not only asymmetric but wrong, as the regressor x[1:] is
"future x"!  More opportunities for bugs arise in the replacement for
const().

Even if you don't agree about the bugs (and there is a weak argument
that some fraction of the potential bugs will be caught by strict-mode
zip, such as a wrong argument to const()), it's pretty clear which
style is more readable.

2. My programming style is such that if I want couples that are related
to each other, I will almost certainly generate those couples, not
generate them separately in the right orders and then zip as needed.

For example, in one of the test suites two lists are generated
something like this:

    c_int_types = [...]                # list display
    c_int_type_ranges = [construct_range(t) for t in c_int_types]

and in many tests the two lists are zipped to produce appropriately
matched couple.  But I would certainly do

    c_int_types = [...]                # as above
    c_int_types_with_ranges = [(t, construct_range(t)) for t in c_int_types]

Of course I understand that sometimes you might very well care about
the space cost of doing this, but I suspect that if I cared about the
2X cost of c_int_types_with_ranges, I wouldn't pregenerate a list of
ranges at all.  My point is that given my style, this particular use
case will *almost never* occur, so is unlikely to provide an excuse
for strict mode if I'm providing the data.  I suspect this applies to
a lot of claimed use cases.

Of course if I only provide c_int_types, and your function constructs
c_int_type_ranges and zips them, it's fine if you use strict mode --
that doesn't impact me at all.  You probably *should* use strict mode.

But if you claim to be providing a general facility, I think it's on
you to think about whether I might want to feed sequences of unequal
length to the function, even though you never would.  That's quite a
burden to assume, though, unless you simply provide a strict mode flag
in your functions (which you can default to strict!) and let me choose.

Steve

[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

Stephen J. Turnbull