Re: [Python-Dev] Re: PEP 618: Add Optional Length-Checking To zip

On Fri, May 15, 2020 at 05:46:43PM +0200, Antoine Pitrou wrote:
I spent a significant amount of time and mental energy explaining in detail why a boolean flag is a poor API and is objectively the wrong interface here. This is not just a matter of personal taste: there are reasons why a flag is wrong here. Flags are for combining independent and orthogonal settings which can be combined. This is not such a feature. I think it is objectively the worst API on the table here, for reasons already discussed. - Your preferred option makes the strict zip version a second-class citizen of the language; - your preferred option is the least open to future enhancements; - your most hated option is the one which follows the Zen of Python the most closely (namely, the koan about having more namespaces); - and is the most object-oriented solution (it's effectively a method); - and most importantly you explicitly *oppose* every alternative API, giving them negative preferences; you would rather not have this zip variant at all than have an interface other than a boolean flag. If I needed this function[1], I'd accept it even if it were spelled `xyghasx.peyahc.flihaj()` and required me to set a global variable to make it work. So I can't help but interpret your total opposition to every other interface as a strong sign that you don't really need this zip variant at all. If your opinion is typical, perhaps we should just reject the PEP. Could you explain why you believe a bool flag is the only suitable interface? Objective reasons preferred please. The same goes for everyone else who gave a vote. If you have an objective reason for wanting this strict zip function to be a second class citizen that cannot be easily extended in the future, then please explain why. [1] I don't think I do. If I had my preference, thinking only of my own needs, I'd probably reject the PEP. But I acknowledge that others seem to need it. -- Steven

On 05/16/2020 01:16 AM, Steven D'Aprano wrote:
On Fri, May 15, 2020 at 05:46:43PM +0200, Antoine Pitrou wrote:
Did you mean for this to go to python-dev where the current discussion is at? -- ~Ethan~

On Sat, May 16, 2020 at 1:26 AM Steven D'Aprano <steve@pearwood.info> wrote:
Clearly it is not the objective truth, otherwise you would have an easier way convincing everyone. :-)
Sorry, Ihave to object to your use of the word "objectively". Clearly what's worst depends upon one's perspective.
- Your preferred option makes the strict zip version a second-class citizen of the language;
With any other option it will still be that -- "zip" is the dominant name here, (a) because it's so short, (b) because it's somehow a memorable word, (c) because it's been around for 20 years.
- your preferred option is the least open to future enhancements;
Given zip's stability, I doubt there will be a lot of other future enhancements any time soon. In Python's culture, boolean flag is the most common way to modify the behavior of a function. The reasons have to do with tradition (lots of existing APIs use this pattern) as well as ease of implementation (*also* in the Zen!), and also with how people *think* about certain APIs. Zip-strict is "like zip, but strict(er) in its requirements".
- your most hated option is the one which follows the Zen of Python the most closely (namely, the koan about having more namespaces);
I'm sorry, I'm not buying this. While for *classes* , alternate constructors are a well-known pattern (dict.from_keys(), datetime.fromtimestamp(), etc.), for *functions* (and almost everyone thinks of zip as a function -- including the docs <https://docs.python.org/3/library/functions.html#zip>!) this pattern is uncommon, and awkward to implement. (You have to write it as a separate function and then make that function a function attribute of the first function.) It is also quite uncommonly found -- no other builtin *function* uses it, and only one function in itertools uses it. I know there are 3rd party frameworks that use this convention, and as a general convention for a framework it's fine. But for an existing builtin function I think a lot of people will do a double-take when they read it in someone else's code (thinking it may be a typo). Whereas nobody will lift an eyebrow when they see zip(a, b, strict=True) -- even if they've never heard of it, they immediately know that it's a modification to the zip() behavior they know.
- and is the most object-oriented solution (it's effectively a method);
That's not even an argument. (It's abuse <https://www.youtube.com/watch?v=ohDB5gbtaEQ>. :-)
I think that's a fallacy. Opposing a proposal is not the same as saying you would rather have nothing than that. It simply expresses a strong negative reaction. If I needed this function[1], I'd accept it even if it were spelled
Another straw-man. My personal vote, for example, is +1 on zip(strict=True) and -1 on zip.strict(). But if the SC chooses the latter, I'll happily use it.
Could you explain why you believe a bool flag is the only suitable interface? Objective reasons preferred please.
An unreasonable request. I've tried to explain above why in the context of where we are (Python 3.9) I find it the best option.
It will *always* be a second class citizen, no matter how you name it, unless you change the behavior of zip(), which is off the table.
Then I really have to wonder why you are so invested in convincing everyone that your proposal is the only one that's objectively acceptable. Finally, I have to clarify something. In the past I've often said that if you're thinking to introduce a boolean flag to an API that's always going to be passed as a constant (if at all), you're probably better off with a separate function. This would seem to be such a case. Yet I am not following my advice. Why? (a) It's a rule of thumb, and in this case I find zip_strict() just a bit less clean than zip(strict=True); an relegating it to itertools.zip_strict() makes it a lot less attractive. And (b) That rule is most important when the flag affects the *return type* of a function. This is because static checkers have a hard time with such APIs. (Almost-example: open(..., "rb") returns an IO[bytes] while open(..., "r") returns an IO[str].) PS. Why wasn't a new builtin zip_strict() on the menu? I think I would have given it at least +0.5, because of this rule of thumb. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

PS. Why wasn't a new builtin zip_strict() on the menu? I think I would have given it at least +0.5, because of this rule of thumb.
I would think that if zip_strict() added as a builtin, then zip_longest() should too. And the fact that zip_longest was not added as a builtin made me think that it was a non-starter. Which kinda brings up a point— in the example string methods (formerly the string module) there are a number of separate functions that could have been one function with flags. And that works well. But partly because it’s a a well defined namespace. We really don’t want to clutter up builtins too much, and having such closely related functions in different namespaces really reduces the usability. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, 17 May 2020 at 07:10, Christopher Barker <pythonchb@gmail.com> wrote:
I think that's just because zip_longest isn't a very compelling alternative to zip. I've known of it for a long time and I don't remember *ever* using it. If builtins had zip_shortest (i.e. current zip), zip_strict and zip_longest then I think I would use zip_strict 95% of the time, zip_shortest 5% of the time and zip_longest 0% of the time. -- Oscar

On 16/05/2020 17:14, Guido van Rossum wrote:
OK, let's put some numbers on this. We only have 9 votes in, but aside from Brandt (whose position is fairly obvious from the PEP) that includes most of the people who have expressed strong opinions one way or another. Ignoring the nuances of +/-0, we end up with: itertools.zip_strict(): +5.5 zip(strict=True): +1 zip.strict(): -1.9 zip(mode='strict'): -4 (I bet Steven is wishing he hadn't been so generous to the "strict=True" option now :-) Now I'm not fool enough to mistake a public vote for an objective truth (::looks pointedly around world politics::) but there are some interesting conclusions here. 1) Nobody (aside from Steven) wants a "mode" keyword. I suspect this is because both of the two main camps (zip(strict=True) and zip_strict()) have solid reasons against it; for the first, it's an unnecessary distraction, and for the second the existence of zip_longest() argues against it. 2) People don't hate zip.strict() as much as I had expected. 3) The PEP needs to come up with an actual argument against itertools.zip_strict(). The current dismissal ain't going to cut it. -- Rhodri James *-* Kynesim Ltd

On Sun, May 17, 2020 at 6:14 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
Let me attempt a metaphor, which won't be perfect but may help: The safety one gets from strictness is a bit like driving a car wearing a seat belt. It is not fundamentally different from driving a car without a seat belt, and in most cases you hope the seat belt will not come into play. But it is a precaution that is worth taking in *most* circumstances (not all, e.g. for infants a standard seat belt won't work). The built-in approaches (whether spelled as zip(strict=True), or zip.strict(), or a new zip_strict() builtin) make it really easy for everybody to take the safety precaution where it is appropriate. By contrast, putting it in itertools where it has to be imported is like requiring someone to rent seat belts in order to use them in their car. Some safety-conscious folks will go to the trouble; others won't. itertools.zip_longest() is sort of a power tool for specialized situations—in our analogy, say, a pickup truck that has to be rented. Yes, with some work it *can* be used to emulate strict-zip, but most people won't think of it that way; they will think of it as something you only need in special situations. Is there some logic to the objection that it is weird to have two forms of zip (or one form with two variants) that are built in, and a third that is in itertools? Sure. But this seems to me a clear case of practicality beats purity. As an extremely common use case of zip (for many people), it will be most useful if it is built in. Nathan

On Sun, May 17, 2020 at 6:22 PM Nathan Schneider <neatnate@gmail.com> wrote:
Thanks Nathan, I think this is the right idea. To make it a bit less metaphorical, strict zip is essentially an assertion. The language doesn't need assert statements - instead of ``` assert cond, message ``` we could just write [1]: ``` if __debug__ and not cond: raise AssertionError(message) ``` or import an `assert` function and write: ``` __debug__ and assert(cond, message) ``` But it's good that we have the assert statement, because it makes it easy to write safe code and so people are encouraged to do so. Similarly, you can write code without writing tests. Often that's really tempting and writing tests feels a bit pointless. It may be obvious that the code works, and the tests won't reveal anything at the time. But even then it's helpful when someone later makes a breaking change and are alerted immediately. So we need frameworks to make testing as easy as possible so we can fight the temptation to not write tests. The Python community has taken this as far as pytest's AST magic just to be able to write `assert x == y` instead of `self.assertEqual(x, y)`. A strict zip often won't provide any benefit when it's written, as it's 'obvious' that the lengths involved are equal, but just like tests it can prevent regressions. ------ [1] Assuming that the compiler optimises away the statement entirely when `__debug__` is False. Right now it seems that CPython can optimise away `if __debug__:` and `if False:` but not `if __debug__ and True:` even though it collapses the condition to a constant False which it immediately tests for: ``` from dis import dis def foo(): if __debug__ or True: print(3) dis(foo) 5 0 LOAD_CONST 1 (False) 2 POP_JUMP_IF_TRUE 4 6 >> 4 LOAD_GLOBAL 0 (print) 6 LOAD_CONST 3 (3) 8 CALL_FUNCTION 1 10 POP_TOP 12 LOAD_CONST 0 (None) 14 RETURN_VALUE ```

On 05/17/2020 10:18 AM, Alex Hall wrote:
But it's good that we have the assert statement, because it makes it easy to write safe code and so people are encouraged to do so.
I could not disagree more strongly with this. Every time I have seen assert used it was in such a way that the program could easily silently produce erroneous results or fail far from the error if asserts were turned off. -- ~Ethan~

On Sun, May 17, 2020 at 10:12 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I assume you mean that you'd like the condition in the assert to always be checked (-O or not), not that the asserts actually change behaviour. But then, isn't that how all asserts are? The only way turning off an assert can't be a problem is if you're sure the condition is always true, and then it's not needed at all. Anyway, I don't think anyone is arguing that strict zip should be turned off by -O, so for a closer analogy, let's similarly imagine that asserts can't be turned off, and they're just a convenient way to check correctness. In that case assert would probably just be a function `assert(condition, message)` since there wouldn't be a need for a special syntax. We'd be faced with the same choice - builtin or standard library import? Again, I think it'd be best as a builtin to make checking for correctness as frictionless as possible. Actually in that situation many would argue not to include such a feature in the language at all, saying it's easy enough to use an if statement or define your own function. But that would again discourage people when they're feeling lazy and they'd just leave out the check entirely. Back to the real world. Consider the problem I think you're talking about: someone has used assert when you think they should have used if+raise to make sure the check is always there. Sometimes this is because they don't know asserts might be turned off, but there are other times when they know and just don't care enough. I know that's been me sometimes. That's evidence that programmers are lazy and will often choose the *slightly* more convenient option over safety. Also note that no one (AFAIK) solves this problem by writing their own function assert_(condition, message). It would be trivial, but writing it and importing it doesn't feel like it's worth the effort.

On Sun, May 17, 2020 at 12:22 PM Nathan Schneider <neatnate@gmail.com> wrote:
That's a really terrible analogy. :-( I never drive without a seat belt. And a never want the seat belt to actually matter, of course. Everyone who want a zip_strict behavior (including me) wants to be able either to catch the exception explicitly or to have the program fail-fast/fail-hard because of it. In contrast, as I've said, more than half of the times that *I* use zip() it would be BROKEN by using zip_strict() instead (or zip(..., strict=True), or whichever spelling). Raising an exception for something I want to succeed, and I want to work exactly as it does (e.g. leave some iterators unconsumed) is not a "harmless safety precaution". If you want a better metaphor: Some door handles include locks, others do not. "Strict" ones have locks. So yes, it's possible to leave the lock in the unlocked position, and then it functions pretty much the same as one without a lock. But likewise, it's possible to leave the door in the locked position when you don't have the key on you, and you face a significant inconvenience that serves no purpose. I have some doors with locks, and some other doors without locks. I have both for a good reason, but the reasons are different, and depend on various things like whether a particular room is private or contains valuables. In truth though, I don't lock my outside doors because I live in a community of "consenting adults" (occasionally I do lock the bathroom door for privacy, for a short while... no-locks is definitely strongly my default mode, as is no-strict when I zip).
--
The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Sun, May 17, 2020 at 1:32 PM David Mertz <mertz@gnosis.cx> wrote:
Good, I think we're getting to the crux of the usability debate. For some of us, strictness is a property that users often want when they use zip(), whether they are properly enforcing it or not—so giving them a really easy way to opt into it would help avoid bugs. (Personally, I cannot remember the last time I actually wanted non-strict zip.) I think David is saying that he more often wants non-strict zip, and it would be tempting and dangerous to make strict-zip too easy to use for those who aren't thinking carefully about their code, so it would be better to bury strict-zip in itertools for the power users who really know they need it. (Is that a fair summary?) As long as we are not changing the default behavior of zip(), I don't anticipate a ton of users using strict-zip unthinkingly—I would guess the risk of uncaught bugs with the status quo is much, much higher. Is there a precedent where a new non-default option was introduced and incorrect use of it became widespread? Nathan

On Sun, May 17, 2020 at 2:09 PM Nathan Schneider <neatnate@gmail.com> wrote:
The API matter is really orthogonal to this. My point here is that Nathan and some other discussants are operating under the assumption that: "Everyone really wants strict-zip but they just haven't had a way to express it conveniently. They all assume their iterables are the same length already, so this just adds a check." I disagree strongly with that assumption. I think that the actual majority of my uses of zip are non-strict. Admittedly, I have not scanned my code to count that; for that matter, most of the code I have written is no longer accessible to me, being written for companies I no longer work for (and not open source). But whatever the actual percentages might be, I COMMONLY want a non-strict zip by actual specific intention, not because I've made a wrong assumption about the nature of the iteratables I use. Of course, I also commonly use zip with the implicit assumption that my iterables are the same length... I have most certainly written many lines where I would appropriately choose strict-zip if it existed (whichever API). To me, itertools is not some hidden vault only accessible after performing Herculean labors. I believe boolean mode switches are usually a bad design for Python. Not always, there are exceptions like open(). And I think Guido made the good point that one of the things that makes mode switches bad is when they change the return type, which the `strict=True` API would not do (but it would create a new kind of exception to worry about). In fact, itertools is pretty much the only module where I occasionally write `from itertools import *`. There are many good things in that module that are general purpose. But namespaces, after all, are a honkin' good idea. I think if `zip()` were proposed today as a brand new function that hadn't existed, I would advocate strongly for putting it inside itertools. Probably `map()` and `filter()` similarly. So I don't want zip_strict() to join built-ins, but it's not because I think it is a niche case, but rather because I think we already have more in built-ins than we should, and some of it could very reasonably live in a more descriptive namespace. I would certainly not mind if we added `zip_shortest()` as a synonym for the current zip to itertools. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On 17/05/2020 19:43, David Mertz wrote:
I believe boolean mode switches are usually a bad design for Python. Not always, there are exceptions like open().
Actually, open() is a really bad example. It does have a flag, "closefd" which if False and a file descriptor was passed in rather than a filename leaves the file descriptor open when the file object is closed. The mode parameter than most people will be thinking about really is a mode parameter, not a flag; it folds together four basic opening modes (read, write, exclusive, append), an update flag and a text/binary flag. The former universal newlines flag got separated out to be a mode parameter all its own when it turned out not to be a simple flag after all. I seem to remember that separation being somewhat painful... -- Rhodri James *-* Kynesim Ltd

On 05/17/2020 12:02 PM, Rob Cliffe via Python-ideas wrote:
On 17/05/2020 19:43, David Mertz wrote:
Coefficient of friction. Importing from a stdlib module is not a hardship, so going from the status quo to "from itertools import zip_strict" is already making it "easy for them to do so". Adding it as a flag makes it too easy (temptingly easy according to the PEP) to use zip_strict, so now the pendulum will swing the other direction and we'll get spurious errors. Nothing dissuades from proper safety consciousness like spurious errors and excessive warnings. (YMMV) -- ~Ethan~

On Sun, May 17, 2020 at 02:43:52PM -0400, David Mertz wrote:
To me, itertools is not some hidden vault only accessible after performing Herculean labors.
+1 to David's observation here. I find it remarkable how people on this list who often argue "just put it on PyPI" as if that didn't condemn the proposal to die are now arguing that importing from itertools is an undue burden.
I believe boolean mode switches are usually a bad design for Python.
I don't think that Python is unique in that regard.
Not always, there are exceptions like open().
As Rhodi points out in another message, `open` is not an exception. In fact the opposite: `open` is an excellent example of *not* using bool flags. We have this: open(file, mode='rt', ...) not this: open(file, read=True, write=False, exclusive=False, append=False, binary=False, text=True, update=False, universal=True, ...) There are cases where bool flags are acceptable, even preferred. For example, the reverse=False parameter to sorted() seems to be okay because it is independent of, and orthogonal to, any other sorting parameters such as the key function, or the cmp function in Python 2. If you can compose the behaviour of the flag with the other parameters, it might be okay. For example, back to open: open(..., closefd=True, ...) seems to be fine, since it is independent of any of the value of the other parameters. You can pass closefd=False, or True, regardless of everything else. (There is a technical limitation that it must be True if the file is given by name, but it is independent of everything else.) This proposed mode is not a composable flag like reverse or closefd. If we treat it as a flag instead of a mode, then we either rule out future enhancements to zip (they *must* go into new functions), or commit to piling bool flag upon bool flag even though most of the combinations will be an error. E.g. there were proposals to make `shortest` an explicit parameter. You can't have `zip(shortest=True, strict=True, longest=True, ...)`.
I think that the return type is a red herring here. Here's a strawman function to demonstrate that the return type is not very important in and of itself: def encode(obj, text=True): if text: return json.dumps(obj) else: return pickle.dumps(obj).decode('latin1') Note: json.dumps is another case where the bool flags are acceptable, because they all control orthogonal, independent features of the call. Passing skipkeys=True doesn't prevent you from also passing ensure_ascii=False. In contrast, you can't compose strict=True with other zip modes. -- Steven

On Wed, May 20, 2020 at 6:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
are you sure it's the SAME people? I don't think ever indicated the "just put it PyPi" waas not a MAOJOR impediment to adoption. And I wouldn't say "undue burden" either, but a burden -- so not always the best place to put things ...
I believe boolean mode switches are usually a bad design
I'm still confused why the ternary flag (mode) idea never comes up in these arguments -- I know I like that the best. But yeah, I can except that it's dead. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, May 20, 2020 at 06:31:51PM -0700, Christopher Barker wrote:
Because the standard spelling of flags in ternary logic are True, False, and either Unknown or Maybe, and both `zip(*args, strict=Unknown)` and `strict=Maybe` are awful. Because Python doesn't have builtin ternary flags. Because the usual work-around for that, forcing the falsey value None to mean Unknown or Maybe, is rather odd; but even if it wasn't, `strict=None` is hardly a self-explanatory API. Because a three-state flag is not extensible past three behaviours. (And I'm not even sure I know what you want as the third behaviour.) If you want a *mode* parameter, the modes can be mnemonics for the behaviour: mode='strict' # not tolerant of length mismatches mode='shortest' # stops at the shortest argument; # possible future enhancements mode='longest' # pads to the longest argument mode='skip' # skips arguments as they become empty mode='that thing with StopIteration that Soni wants' The modes wouldn't have to be strings, they could be enums, although we might not want them to be builtins. -- Steven

On Thu, May 21, 2020 at 4:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
OK, sorry for the imprecise language... I don't hink anyone has suggest this type of ternary flag, I was using the term incorrectly, as "a flag that can take three (Or maybe more) values" as there are only three on the table at the moment. Let me rephrase ... oh wait, you already did that for me: I'm still confused why the ternary flag mode idea never comes up in these arguments -- I know I like that the best. But yeah, I can except that it's dead. a *mode* parameter, the modes can be mnemonics for the
(I seriously doubt anyone would go for them being a builtin enum) I personally far prefer strings in this kind of situation -- so much easier than having to find a namespace for the name. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 16/05/2020 17:14, Guido van Rossum wrote:
But Steven has a point. zip(mode='strict') would allow zip_longest behaviour to be incorporated into zip at some future time, should that be considered desirable. (And conceivably other behaviours that might crop up. I don't have any use cases, but one possibility might be "stop zipping after N items".) Whereas doing that with a boolean `strict` would lead, as others have pointed out, to an ugly API (2 booleans that can't both be True). Rob Cliffe

On 05/16/2020 01:16 AM, Steven D'Aprano wrote:
On Fri, May 15, 2020 at 05:46:43PM +0200, Antoine Pitrou wrote:
Did you mean for this to go to python-dev where the current discussion is at? -- ~Ethan~

On Sat, May 16, 2020 at 1:26 AM Steven D'Aprano <steve@pearwood.info> wrote:
Clearly it is not the objective truth, otherwise you would have an easier way convincing everyone. :-)
Sorry, Ihave to object to your use of the word "objectively". Clearly what's worst depends upon one's perspective.
- Your preferred option makes the strict zip version a second-class citizen of the language;
With any other option it will still be that -- "zip" is the dominant name here, (a) because it's so short, (b) because it's somehow a memorable word, (c) because it's been around for 20 years.
- your preferred option is the least open to future enhancements;
Given zip's stability, I doubt there will be a lot of other future enhancements any time soon. In Python's culture, boolean flag is the most common way to modify the behavior of a function. The reasons have to do with tradition (lots of existing APIs use this pattern) as well as ease of implementation (*also* in the Zen!), and also with how people *think* about certain APIs. Zip-strict is "like zip, but strict(er) in its requirements".
- your most hated option is the one which follows the Zen of Python the most closely (namely, the koan about having more namespaces);
I'm sorry, I'm not buying this. While for *classes* , alternate constructors are a well-known pattern (dict.from_keys(), datetime.fromtimestamp(), etc.), for *functions* (and almost everyone thinks of zip as a function -- including the docs <https://docs.python.org/3/library/functions.html#zip>!) this pattern is uncommon, and awkward to implement. (You have to write it as a separate function and then make that function a function attribute of the first function.) It is also quite uncommonly found -- no other builtin *function* uses it, and only one function in itertools uses it. I know there are 3rd party frameworks that use this convention, and as a general convention for a framework it's fine. But for an existing builtin function I think a lot of people will do a double-take when they read it in someone else's code (thinking it may be a typo). Whereas nobody will lift an eyebrow when they see zip(a, b, strict=True) -- even if they've never heard of it, they immediately know that it's a modification to the zip() behavior they know.
- and is the most object-oriented solution (it's effectively a method);
That's not even an argument. (It's abuse <https://www.youtube.com/watch?v=ohDB5gbtaEQ>. :-)
I think that's a fallacy. Opposing a proposal is not the same as saying you would rather have nothing than that. It simply expresses a strong negative reaction. If I needed this function[1], I'd accept it even if it were spelled
Another straw-man. My personal vote, for example, is +1 on zip(strict=True) and -1 on zip.strict(). But if the SC chooses the latter, I'll happily use it.
Could you explain why you believe a bool flag is the only suitable interface? Objective reasons preferred please.
An unreasonable request. I've tried to explain above why in the context of where we are (Python 3.9) I find it the best option.
It will *always* be a second class citizen, no matter how you name it, unless you change the behavior of zip(), which is off the table.
Then I really have to wonder why you are so invested in convincing everyone that your proposal is the only one that's objectively acceptable. Finally, I have to clarify something. In the past I've often said that if you're thinking to introduce a boolean flag to an API that's always going to be passed as a constant (if at all), you're probably better off with a separate function. This would seem to be such a case. Yet I am not following my advice. Why? (a) It's a rule of thumb, and in this case I find zip_strict() just a bit less clean than zip(strict=True); an relegating it to itertools.zip_strict() makes it a lot less attractive. And (b) That rule is most important when the flag affects the *return type* of a function. This is because static checkers have a hard time with such APIs. (Almost-example: open(..., "rb") returns an IO[bytes] while open(..., "r") returns an IO[str].) PS. Why wasn't a new builtin zip_strict() on the menu? I think I would have given it at least +0.5, because of this rule of thumb. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

PS. Why wasn't a new builtin zip_strict() on the menu? I think I would have given it at least +0.5, because of this rule of thumb.
I would think that if zip_strict() added as a builtin, then zip_longest() should too. And the fact that zip_longest was not added as a builtin made me think that it was a non-starter. Which kinda brings up a point— in the example string methods (formerly the string module) there are a number of separate functions that could have been one function with flags. And that works well. But partly because it’s a a well defined namespace. We really don’t want to clutter up builtins too much, and having such closely related functions in different namespaces really reduces the usability. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, 17 May 2020 at 07:10, Christopher Barker <pythonchb@gmail.com> wrote:
I think that's just because zip_longest isn't a very compelling alternative to zip. I've known of it for a long time and I don't remember *ever* using it. If builtins had zip_shortest (i.e. current zip), zip_strict and zip_longest then I think I would use zip_strict 95% of the time, zip_shortest 5% of the time and zip_longest 0% of the time. -- Oscar

On 16/05/2020 17:14, Guido van Rossum wrote:
OK, let's put some numbers on this. We only have 9 votes in, but aside from Brandt (whose position is fairly obvious from the PEP) that includes most of the people who have expressed strong opinions one way or another. Ignoring the nuances of +/-0, we end up with: itertools.zip_strict(): +5.5 zip(strict=True): +1 zip.strict(): -1.9 zip(mode='strict'): -4 (I bet Steven is wishing he hadn't been so generous to the "strict=True" option now :-) Now I'm not fool enough to mistake a public vote for an objective truth (::looks pointedly around world politics::) but there are some interesting conclusions here. 1) Nobody (aside from Steven) wants a "mode" keyword. I suspect this is because both of the two main camps (zip(strict=True) and zip_strict()) have solid reasons against it; for the first, it's an unnecessary distraction, and for the second the existence of zip_longest() argues against it. 2) People don't hate zip.strict() as much as I had expected. 3) The PEP needs to come up with an actual argument against itertools.zip_strict(). The current dismissal ain't going to cut it. -- Rhodri James *-* Kynesim Ltd

On Sun, May 17, 2020 at 6:14 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
Let me attempt a metaphor, which won't be perfect but may help: The safety one gets from strictness is a bit like driving a car wearing a seat belt. It is not fundamentally different from driving a car without a seat belt, and in most cases you hope the seat belt will not come into play. But it is a precaution that is worth taking in *most* circumstances (not all, e.g. for infants a standard seat belt won't work). The built-in approaches (whether spelled as zip(strict=True), or zip.strict(), or a new zip_strict() builtin) make it really easy for everybody to take the safety precaution where it is appropriate. By contrast, putting it in itertools where it has to be imported is like requiring someone to rent seat belts in order to use them in their car. Some safety-conscious folks will go to the trouble; others won't. itertools.zip_longest() is sort of a power tool for specialized situations—in our analogy, say, a pickup truck that has to be rented. Yes, with some work it *can* be used to emulate strict-zip, but most people won't think of it that way; they will think of it as something you only need in special situations. Is there some logic to the objection that it is weird to have two forms of zip (or one form with two variants) that are built in, and a third that is in itertools? Sure. But this seems to me a clear case of practicality beats purity. As an extremely common use case of zip (for many people), it will be most useful if it is built in. Nathan

On Sun, May 17, 2020 at 6:22 PM Nathan Schneider <neatnate@gmail.com> wrote:
Thanks Nathan, I think this is the right idea. To make it a bit less metaphorical, strict zip is essentially an assertion. The language doesn't need assert statements - instead of ``` assert cond, message ``` we could just write [1]: ``` if __debug__ and not cond: raise AssertionError(message) ``` or import an `assert` function and write: ``` __debug__ and assert(cond, message) ``` But it's good that we have the assert statement, because it makes it easy to write safe code and so people are encouraged to do so. Similarly, you can write code without writing tests. Often that's really tempting and writing tests feels a bit pointless. It may be obvious that the code works, and the tests won't reveal anything at the time. But even then it's helpful when someone later makes a breaking change and are alerted immediately. So we need frameworks to make testing as easy as possible so we can fight the temptation to not write tests. The Python community has taken this as far as pytest's AST magic just to be able to write `assert x == y` instead of `self.assertEqual(x, y)`. A strict zip often won't provide any benefit when it's written, as it's 'obvious' that the lengths involved are equal, but just like tests it can prevent regressions. ------ [1] Assuming that the compiler optimises away the statement entirely when `__debug__` is False. Right now it seems that CPython can optimise away `if __debug__:` and `if False:` but not `if __debug__ and True:` even though it collapses the condition to a constant False which it immediately tests for: ``` from dis import dis def foo(): if __debug__ or True: print(3) dis(foo) 5 0 LOAD_CONST 1 (False) 2 POP_JUMP_IF_TRUE 4 6 >> 4 LOAD_GLOBAL 0 (print) 6 LOAD_CONST 3 (3) 8 CALL_FUNCTION 1 10 POP_TOP 12 LOAD_CONST 0 (None) 14 RETURN_VALUE ```

On 05/17/2020 10:18 AM, Alex Hall wrote:
But it's good that we have the assert statement, because it makes it easy to write safe code and so people are encouraged to do so.
I could not disagree more strongly with this. Every time I have seen assert used it was in such a way that the program could easily silently produce erroneous results or fail far from the error if asserts were turned off. -- ~Ethan~

On Sun, May 17, 2020 at 10:12 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I assume you mean that you'd like the condition in the assert to always be checked (-O or not), not that the asserts actually change behaviour. But then, isn't that how all asserts are? The only way turning off an assert can't be a problem is if you're sure the condition is always true, and then it's not needed at all. Anyway, I don't think anyone is arguing that strict zip should be turned off by -O, so for a closer analogy, let's similarly imagine that asserts can't be turned off, and they're just a convenient way to check correctness. In that case assert would probably just be a function `assert(condition, message)` since there wouldn't be a need for a special syntax. We'd be faced with the same choice - builtin or standard library import? Again, I think it'd be best as a builtin to make checking for correctness as frictionless as possible. Actually in that situation many would argue not to include such a feature in the language at all, saying it's easy enough to use an if statement or define your own function. But that would again discourage people when they're feeling lazy and they'd just leave out the check entirely. Back to the real world. Consider the problem I think you're talking about: someone has used assert when you think they should have used if+raise to make sure the check is always there. Sometimes this is because they don't know asserts might be turned off, but there are other times when they know and just don't care enough. I know that's been me sometimes. That's evidence that programmers are lazy and will often choose the *slightly* more convenient option over safety. Also note that no one (AFAIK) solves this problem by writing their own function assert_(condition, message). It would be trivial, but writing it and importing it doesn't feel like it's worth the effort.

On Sun, May 17, 2020 at 12:22 PM Nathan Schneider <neatnate@gmail.com> wrote:
That's a really terrible analogy. :-( I never drive without a seat belt. And a never want the seat belt to actually matter, of course. Everyone who want a zip_strict behavior (including me) wants to be able either to catch the exception explicitly or to have the program fail-fast/fail-hard because of it. In contrast, as I've said, more than half of the times that *I* use zip() it would be BROKEN by using zip_strict() instead (or zip(..., strict=True), or whichever spelling). Raising an exception for something I want to succeed, and I want to work exactly as it does (e.g. leave some iterators unconsumed) is not a "harmless safety precaution". If you want a better metaphor: Some door handles include locks, others do not. "Strict" ones have locks. So yes, it's possible to leave the lock in the unlocked position, and then it functions pretty much the same as one without a lock. But likewise, it's possible to leave the door in the locked position when you don't have the key on you, and you face a significant inconvenience that serves no purpose. I have some doors with locks, and some other doors without locks. I have both for a good reason, but the reasons are different, and depend on various things like whether a particular room is private or contains valuables. In truth though, I don't lock my outside doors because I live in a community of "consenting adults" (occasionally I do lock the bathroom door for privacy, for a short while... no-locks is definitely strongly my default mode, as is no-strict when I zip).
--
The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Sun, May 17, 2020 at 1:32 PM David Mertz <mertz@gnosis.cx> wrote:
Good, I think we're getting to the crux of the usability debate. For some of us, strictness is a property that users often want when they use zip(), whether they are properly enforcing it or not—so giving them a really easy way to opt into it would help avoid bugs. (Personally, I cannot remember the last time I actually wanted non-strict zip.) I think David is saying that he more often wants non-strict zip, and it would be tempting and dangerous to make strict-zip too easy to use for those who aren't thinking carefully about their code, so it would be better to bury strict-zip in itertools for the power users who really know they need it. (Is that a fair summary?) As long as we are not changing the default behavior of zip(), I don't anticipate a ton of users using strict-zip unthinkingly—I would guess the risk of uncaught bugs with the status quo is much, much higher. Is there a precedent where a new non-default option was introduced and incorrect use of it became widespread? Nathan

On Sun, May 17, 2020 at 2:09 PM Nathan Schneider <neatnate@gmail.com> wrote:
The API matter is really orthogonal to this. My point here is that Nathan and some other discussants are operating under the assumption that: "Everyone really wants strict-zip but they just haven't had a way to express it conveniently. They all assume their iterables are the same length already, so this just adds a check." I disagree strongly with that assumption. I think that the actual majority of my uses of zip are non-strict. Admittedly, I have not scanned my code to count that; for that matter, most of the code I have written is no longer accessible to me, being written for companies I no longer work for (and not open source). But whatever the actual percentages might be, I COMMONLY want a non-strict zip by actual specific intention, not because I've made a wrong assumption about the nature of the iteratables I use. Of course, I also commonly use zip with the implicit assumption that my iterables are the same length... I have most certainly written many lines where I would appropriately choose strict-zip if it existed (whichever API). To me, itertools is not some hidden vault only accessible after performing Herculean labors. I believe boolean mode switches are usually a bad design for Python. Not always, there are exceptions like open(). And I think Guido made the good point that one of the things that makes mode switches bad is when they change the return type, which the `strict=True` API would not do (but it would create a new kind of exception to worry about). In fact, itertools is pretty much the only module where I occasionally write `from itertools import *`. There are many good things in that module that are general purpose. But namespaces, after all, are a honkin' good idea. I think if `zip()` were proposed today as a brand new function that hadn't existed, I would advocate strongly for putting it inside itertools. Probably `map()` and `filter()` similarly. So I don't want zip_strict() to join built-ins, but it's not because I think it is a niche case, but rather because I think we already have more in built-ins than we should, and some of it could very reasonably live in a more descriptive namespace. I would certainly not mind if we added `zip_shortest()` as a synonym for the current zip to itertools. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On 17/05/2020 19:43, David Mertz wrote:
I believe boolean mode switches are usually a bad design for Python. Not always, there are exceptions like open().
Actually, open() is a really bad example. It does have a flag, "closefd" which if False and a file descriptor was passed in rather than a filename leaves the file descriptor open when the file object is closed. The mode parameter than most people will be thinking about really is a mode parameter, not a flag; it folds together four basic opening modes (read, write, exclusive, append), an update flag and a text/binary flag. The former universal newlines flag got separated out to be a mode parameter all its own when it turned out not to be a simple flag after all. I seem to remember that separation being somewhat painful... -- Rhodri James *-* Kynesim Ltd

On 05/17/2020 12:02 PM, Rob Cliffe via Python-ideas wrote:
On 17/05/2020 19:43, David Mertz wrote:
Coefficient of friction. Importing from a stdlib module is not a hardship, so going from the status quo to "from itertools import zip_strict" is already making it "easy for them to do so". Adding it as a flag makes it too easy (temptingly easy according to the PEP) to use zip_strict, so now the pendulum will swing the other direction and we'll get spurious errors. Nothing dissuades from proper safety consciousness like spurious errors and excessive warnings. (YMMV) -- ~Ethan~

On Sun, May 17, 2020 at 02:43:52PM -0400, David Mertz wrote:
To me, itertools is not some hidden vault only accessible after performing Herculean labors.
+1 to David's observation here. I find it remarkable how people on this list who often argue "just put it on PyPI" as if that didn't condemn the proposal to die are now arguing that importing from itertools is an undue burden.
I believe boolean mode switches are usually a bad design for Python.
I don't think that Python is unique in that regard.
Not always, there are exceptions like open().
As Rhodi points out in another message, `open` is not an exception. In fact the opposite: `open` is an excellent example of *not* using bool flags. We have this: open(file, mode='rt', ...) not this: open(file, read=True, write=False, exclusive=False, append=False, binary=False, text=True, update=False, universal=True, ...) There are cases where bool flags are acceptable, even preferred. For example, the reverse=False parameter to sorted() seems to be okay because it is independent of, and orthogonal to, any other sorting parameters such as the key function, or the cmp function in Python 2. If you can compose the behaviour of the flag with the other parameters, it might be okay. For example, back to open: open(..., closefd=True, ...) seems to be fine, since it is independent of any of the value of the other parameters. You can pass closefd=False, or True, regardless of everything else. (There is a technical limitation that it must be True if the file is given by name, but it is independent of everything else.) This proposed mode is not a composable flag like reverse or closefd. If we treat it as a flag instead of a mode, then we either rule out future enhancements to zip (they *must* go into new functions), or commit to piling bool flag upon bool flag even though most of the combinations will be an error. E.g. there were proposals to make `shortest` an explicit parameter. You can't have `zip(shortest=True, strict=True, longest=True, ...)`.
I think that the return type is a red herring here. Here's a strawman function to demonstrate that the return type is not very important in and of itself: def encode(obj, text=True): if text: return json.dumps(obj) else: return pickle.dumps(obj).decode('latin1') Note: json.dumps is another case where the bool flags are acceptable, because they all control orthogonal, independent features of the call. Passing skipkeys=True doesn't prevent you from also passing ensure_ascii=False. In contrast, you can't compose strict=True with other zip modes. -- Steven

On Wed, May 20, 2020 at 6:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
are you sure it's the SAME people? I don't think ever indicated the "just put it PyPi" waas not a MAOJOR impediment to adoption. And I wouldn't say "undue burden" either, but a burden -- so not always the best place to put things ...
I believe boolean mode switches are usually a bad design
I'm still confused why the ternary flag (mode) idea never comes up in these arguments -- I know I like that the best. But yeah, I can except that it's dead. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, May 20, 2020 at 06:31:51PM -0700, Christopher Barker wrote:
Because the standard spelling of flags in ternary logic are True, False, and either Unknown or Maybe, and both `zip(*args, strict=Unknown)` and `strict=Maybe` are awful. Because Python doesn't have builtin ternary flags. Because the usual work-around for that, forcing the falsey value None to mean Unknown or Maybe, is rather odd; but even if it wasn't, `strict=None` is hardly a self-explanatory API. Because a three-state flag is not extensible past three behaviours. (And I'm not even sure I know what you want as the third behaviour.) If you want a *mode* parameter, the modes can be mnemonics for the behaviour: mode='strict' # not tolerant of length mismatches mode='shortest' # stops at the shortest argument; # possible future enhancements mode='longest' # pads to the longest argument mode='skip' # skips arguments as they become empty mode='that thing with StopIteration that Soni wants' The modes wouldn't have to be strings, they could be enums, although we might not want them to be builtins. -- Steven

On Thu, May 21, 2020 at 4:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
OK, sorry for the imprecise language... I don't hink anyone has suggest this type of ternary flag, I was using the term incorrectly, as "a flag that can take three (Or maybe more) values" as there are only three on the table at the moment. Let me rephrase ... oh wait, you already did that for me: I'm still confused why the ternary flag mode idea never comes up in these arguments -- I know I like that the best. But yeah, I can except that it's dead. a *mode* parameter, the modes can be mnemonics for the
(I seriously doubt anyone would go for them being a builtin enum) I personally far prefer strings in this kind of situation -- so much easier than having to find a namespace for the name. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 16/05/2020 17:14, Guido van Rossum wrote:
But Steven has a point. zip(mode='strict') would allow zip_longest behaviour to be incorporated into zip at some future time, should that be considered desirable. (And conceivably other behaviours that might crop up. I don't have any use cases, but one possibility might be "stop zipping after N items".) Whereas doing that with a boolean `strict` would lead, as others have pointed out, to an ugly API (2 booleans that can't both be True). Rob Cliffe
participants (10)
-
Alex Hall
-
Christopher Barker
-
David Mertz
-
Ethan Furman
-
Guido van Rossum
-
Nathan Schneider
-
Oscar Benjamin
-
Rhodri James
-
Rob Cliffe
-
Steven D'Aprano