[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

May 14, 2020

      Ethan Furman wrote:
...
So half of your examples are actually counter-examples.
I claimed to have found "dozens of other call sites in Python's standard library and tooling where it would be appropriate to enable this new feature". You asked for references, and I provided two dozen cases of zipping what must be equal length iterables.

I said they were "appropriate", not "needed" or even "recommended". These are call sites where unequal-length iterables, if encountered, would be an error that I would hope wouldn't pass silently. Besides, I don't think it's beyond the realm of imagination for a future refactoring of several of the "Mismatch cannot happen." cases to introduce a bug of this kind.
...
Did you vet them, or just pick matches against `zip(`?
Of course. I spent hours vetting them, to the point of researching the GNU tar extended sparse header and Apple property list formats (and trying to figure out what the hell was happening in `os._fwalk`) just to make sure my understanding was correct.

Ethan Furman wrote:
...
Not the call itself, but the running of zip.  Absent some clever programming it seems to me that there are two choices if we have a flag:
I wouldn't call my implementation "clever", but it differs from both of these options.  We only need to check if we're strict when an error occurs in one of our iterators, which is a situation the C code for `zip` already needs to explicitly handle with a branch. So this condition is only hit on the "last" `__next__` call, not on every single iteration.

As a reminder, the actual C implementation is linked in the PEP (there's no PR yet but branch reviews are welcome), though I'd prefer if the PEP discussion didn't get bogged down in those specifics.  The pure-Python implementation in the PEP is *very* close to it, but it uses different abstractions for some of the details regarding error handling and argument parsing.[0]

However, for those who are interested, there is no measurable performance regression (and no additional parsing overhead for no-keyword-argument calls). Parsing the keyword argument (if present) adds <0.2us of overhead at creation time on my machine. I went ahead and ran some rough PGO/LTO benchmarks:

Creation time:

```

$ ./python-master     -m pyperf timeit 'zip()'
Mean +- std dev: 79.4 ns +- 4.3 ns 
$ ./python-zip-strict -m pyperf timeit 'zip()'
Mean +- std dev: 79.0 ns +- 1.9 ns
$ ./python-zip-strict -m pyperf timeit 'zip(strict=True)'
Mean +- std dev: 240 ns +- 8 ns

```

Creation time + iteration time:

```

$ ./python-master     -m pyperf timeit -s 'r = range(10)' '[*zip(r, r)]'
Mean +- std dev: 577 ns +- 35 ns
$ ./python-zip-strict -m pyperf timeit -s 'r = range(10)' '[*zip(r, r)]'
Mean +- std dev: 565 ns +- 16 ns
$ ./python-zip-strict -m pyperf timeit -s 'r = range(10)' '[*zip(r, r, strict=True)]'
Mean +- std dev: 756 ns +- 27 ns

$ ./python-master     -m pyperf timeit -s 'r = range(100)' '[*zip(r, r)]'
Mean +- std dev: 3.54 us +- 0.14 us
$ ./python-zip-strict -m pyperf timeit -s 'r = range(100)' '[*zip(r, r)]'
Mean +- std dev: 3.49 us +- 0.07 us
$ ./python-zip-strict -m pyperf timeit -s 'r = range(100)' '[*zip(r, r, strict=True)]'
Mean +- std dev: 3.73 us +- 0.13 us

$ ./python-master     -m pyperf timeit -s 'r = range(1000)' '[*zip(r, r)]'
Mean +- std dev: 44.1 us +- 2.0 us
$ ./python-zip-strict -m pyperf timeit -s 'r = range(1000)' '[*zip(r, r)]'
Mean +- std dev: 45.2 us +- 2.0 us
$ ./python-zip-strict -m pyperf timeit -s 'r = range(1000)' '[*zip(r, r, strict=True)]'
Mean +- std dev: 45.2 us +- 1.4 us

```

Additionally, the size of a `zip` instance has not changed.  Pickles for non-strict `zip` instances are unchanged as well.

Brandt

[0] And zip's current tuple caching, which is *very* clever.

[Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

Brandt Bucher