Accepting PEP 618: zip(strict=True)

After taking a break to recapitulate from the vigorous debate, Brandt Bucher has revised PEP 618 <https://www.python.org/dev/peps/pep-0618/> and submitted it for review <https://github.com/python/peps/pull/1429>. I volunteered to be PEP-Delegate (the new term for the outdated BDFL-Delegate) and the SC has approved <https://github.com/python/steering-council/issues/28#issuecomment-644280869> me for this role. (Note that Antoine, the PEP's sponsor, declined to be the lightning rod, er, PEP-Delegate.) I have now reviewed the PEP and skimmed some of the more recent discussion about the topic. It is clear that no solution will win everyone over. But most seem to agree that offering *some* solution for the stated problem is better than none. To spare us more heartache, I am hereby accepting PEP 618. I expect that the implementation <https://github.com/python/cpython/pull/20921> will land soon. I have two very minor editorial remarks, which Brandt may address at his leisure: - The "Backward Compatibility" section could be beefed up slightly, e.g. by pointing out that the default remains strict=False and that zip previously did not take keyword arguments at all. - The error messages are somewhat odd: why is the error sometimes that one iterator is too long, and other times that one iterator is too short? All we really know is that not all iterators have the same length, but the current phrasing seems to be assuming that the first iterator is never too short or too long. Congratulations, Brandt! -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Woo! Many thanks to Ram for the idea, Antoine for sponsoring, Guido for PEP-Delegating, and everyone on -Ideas and -Dev for the spirited discussion and review. Brandt

Well done Brandt! Even if only a few people had issues with zip(), I think that it was a long awaited feature. It's great to have it in Python 3.10! It wasn't trivial or convenient to emulate the feature (check manually the length) on Python 3.9 and older. zip(strict=True) should help to write more reliable code. Maybe it's time to review stdlib code to check if some functions would deserve the addition of strict=True? I had a look and found a few suspicious usage of zip(). But I'm not sure if we want to make these functions stricter. (*) For example, ast._Unparse.visit_Compare() uses "for o, e in zip(node.ops, node.comparators):" which expects that AST is correct. But many projects modify AST and even generate AST from scratch. On the other hand, tolerating minor inconsistencies can also be seen as a feature for ast.unparse(). (*) Another example: dis.findlinestarts() expects co_lnotab has a length which a multiple of 2, but PyCode_New() doesn't provide such warranty: byte_increments = code.co_lnotab[0::2] line_increments = code.co_lnotab[1::2] for byte_incr, line_incr in zip(byte_increments, line_increments): ... Hum, maybe it's a bug in codeobject.c which should be stricter. The file uses "Py_ssize_t size = PyBytes_Size(co->co_lnotab) / 2;". Well, that's a minor issue. I don't expect a bug if co_lnotab has an odd length, the last byte is simply ignored. (*) Another example in inspect.getclosurevars(): nonlocal_vars = { var : cell.cell_contents for var, cell in zip(code.co_freevars, func.__closure__) } I'm not sure that func.__closure__ and func.__code__.co_freevars are always consistent. For example, PyFunction_SetClosure() doesn't enforce. Victor Le mer. 17 juin 2020 à 01:14, Guido van Rossum <guido@python.org> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.

17.06.20 11:42, Victor Stinner пише:
I did have such plan: 1. Add the zip_equal() builtin and replace all calls of zip() with zip_equal(). 2. Run tests and revert zip_equal() back to zip() until tests pass. 3. Manually review all remaining zip_equal() and left only these which are absolutely correct. 4. Replace zip_equal() with zip(strict=True). It would be easier if add a new function instead of a new keyword argument to the existing function.

On Thu, Jun 18, 2020 at 8:06 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
It would be easier if add a new function instead of a new keyword argument to the existing function.
We've implemented the new zip in our sitecustomize.py, and think the keyword makes it easier. I've instructed our development staff to examine all use of zip as they come across them and add either "strict=True" or "strict=False" when they've determined which is appropriate. Any zip calls without an explicit "strict=" will be deemed "unknown" and requiring further investigation.

On Thu, Jun 18, 2020 at 2:36 PM Eric Fahlgren <ericfahlgren@gmail.com> wrote:
That's actually a really nice validation of the choice to use a keyword -- none of the other options debated (which were all variations on "give the alternate behavior a different name") would offer the opportunity to state "I've thought about it and it's definitely okay that the iterables have different lengths at this call site." Sure, in most places this would just look redundant, but in large corporate code bases that's exactly the kind of thing that people like to do. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

19.06.20 02:57, Guido van Rossum пише:
I did not participate in the recent debates, but in the initial discussion it was proposed to add zip_equal() and zip_shortest() as an alias of zip() (with possibility of changing the behavior of zip() in distant future). There are several advantages of a different name over a boolean keyword argument. 1. It is easier to search in the code. 2. It is easier to replace. 3. It looks more distinguishable. 4. It is faster to call function with positional arguments that with keyword arguments (at the level of bytecode and at the level of parsing arguments). 5. And the implementation is simpler. Of course the ship is already sailed, but I was surprised that the keyword variant won, while it was not popular initially.

Woo! Many thanks to Ram for the idea, Antoine for sponsoring, Guido for PEP-Delegating, and everyone on -Ideas and -Dev for the spirited discussion and review. Brandt

Well done Brandt! Even if only a few people had issues with zip(), I think that it was a long awaited feature. It's great to have it in Python 3.10! It wasn't trivial or convenient to emulate the feature (check manually the length) on Python 3.9 and older. zip(strict=True) should help to write more reliable code. Maybe it's time to review stdlib code to check if some functions would deserve the addition of strict=True? I had a look and found a few suspicious usage of zip(). But I'm not sure if we want to make these functions stricter. (*) For example, ast._Unparse.visit_Compare() uses "for o, e in zip(node.ops, node.comparators):" which expects that AST is correct. But many projects modify AST and even generate AST from scratch. On the other hand, tolerating minor inconsistencies can also be seen as a feature for ast.unparse(). (*) Another example: dis.findlinestarts() expects co_lnotab has a length which a multiple of 2, but PyCode_New() doesn't provide such warranty: byte_increments = code.co_lnotab[0::2] line_increments = code.co_lnotab[1::2] for byte_incr, line_incr in zip(byte_increments, line_increments): ... Hum, maybe it's a bug in codeobject.c which should be stricter. The file uses "Py_ssize_t size = PyBytes_Size(co->co_lnotab) / 2;". Well, that's a minor issue. I don't expect a bug if co_lnotab has an odd length, the last byte is simply ignored. (*) Another example in inspect.getclosurevars(): nonlocal_vars = { var : cell.cell_contents for var, cell in zip(code.co_freevars, func.__closure__) } I'm not sure that func.__closure__ and func.__code__.co_freevars are always consistent. For example, PyFunction_SetClosure() doesn't enforce. Victor Le mer. 17 juin 2020 à 01:14, Guido van Rossum <guido@python.org> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.

17.06.20 11:42, Victor Stinner пише:
I did have such plan: 1. Add the zip_equal() builtin and replace all calls of zip() with zip_equal(). 2. Run tests and revert zip_equal() back to zip() until tests pass. 3. Manually review all remaining zip_equal() and left only these which are absolutely correct. 4. Replace zip_equal() with zip(strict=True). It would be easier if add a new function instead of a new keyword argument to the existing function.

On Thu, Jun 18, 2020 at 8:06 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
It would be easier if add a new function instead of a new keyword argument to the existing function.
We've implemented the new zip in our sitecustomize.py, and think the keyword makes it easier. I've instructed our development staff to examine all use of zip as they come across them and add either "strict=True" or "strict=False" when they've determined which is appropriate. Any zip calls without an explicit "strict=" will be deemed "unknown" and requiring further investigation.

On Thu, Jun 18, 2020 at 2:36 PM Eric Fahlgren <ericfahlgren@gmail.com> wrote:
That's actually a really nice validation of the choice to use a keyword -- none of the other options debated (which were all variations on "give the alternate behavior a different name") would offer the opportunity to state "I've thought about it and it's definitely okay that the iterables have different lengths at this call site." Sure, in most places this would just look redundant, but in large corporate code bases that's exactly the kind of thing that people like to do. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

19.06.20 02:57, Guido van Rossum пише:
I did not participate in the recent debates, but in the initial discussion it was proposed to add zip_equal() and zip_shortest() as an alias of zip() (with possibility of changing the behavior of zip() in distant future). There are several advantages of a different name over a boolean keyword argument. 1. It is easier to search in the code. 2. It is easier to replace. 3. It looks more distinguishable. 4. It is faster to call function with positional arguments that with keyword arguments (at the level of bytecode and at the level of parsing arguments). 5. And the implementation is simpler. Of course the ship is already sailed, but I was surprised that the keyword variant won, while it was not popular initially.
participants (8)
-
Antoine Pitrou
-
Brandt Bucher
-
Eric Fahlgren
-
Ethan Furman
-
Guido van Rossum
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Victor Stinner