On Sat, Apr 25, 2020 at 09:39:14AM -0700, Christopher Barker wrote: [...]
File handling code ought to be resilient in the face of such meaningless differences,
sure. But what difference is "meaningless" depends on the use case. For instance, comments or blank lines in the middle of a file may be a meaningless difference. And you'd want to handle that before zipping anyway.
Probably. But once you get past that simple zip(*files) pattern, the whole processing loop becomes more complex and it isn't so obvious that you'll end up using zip *at all* let alone the proposed strict version. [...]
So my argument is that anything you want zip_strict for is better handled with zip_longest -- including the case of just raising.
That is quite the leap! You make a decent case about handling empty lines in files, but extending that to "anything" is unwarranted.
Okay, fair point -- I should have said "just as well or better" rather than just better. And it's not an unwarranted leap, because you can easily implement zip_strict from zip_longest. zip_longest provides all the functionality of zip_strict, plus more: * zip_strict can *only* raise if there is a mismatch; * zip_longest can raise, or truncate, or pad the data with a default; you can transform the short data in any way you want. * A few days ago, I needed a version of zip that simply ignored missing values: zip_shrinking(['ab', '1234', 'xyz']) --> a1x b2y 3z 4 and I knocked up one using zip_longest in about thirty seconds. If we could only have one version of zip, it would be a no-brainer to choose zip_longest.
I honestly do not understand the resistance here. Yes, any change to the standard library should be carefully considered, and any change IS a disruption, and this proposed change may not be worth it. But arguing that it wouldn't ever be useful, I jsut don't get.
I am sorry if I have given you the impression that I believe that there is *never* any reason to validate equal lengths. That is not my position, and I apologise if I said anything that gave you that idea. Of course I don't believe that there is no code anywhere in the world that could make use of a zip_strict, that would be silly, but I do have serious doubts that the use-cases are all three of sufficiently important, common, and performance sensitive to justify making it a builtin. We have five options here: - Status quo wins a draw: do nothing. - Add a new builtin. - Add a flag to zip(). - Add a zip_strict to itertools. - Add a recipe to itertools showing how to do it. If Raymond agrees, I wouldn't oppose a version in itertools, even though I have my doubts about its usefulness and I think that it will more often be an attractive nuisance than an actual help. But the barrier to entry for the stdlib is lower than for builtins. I also dislike the proposed builtin API: bool flag arguments are, in my opinion, a code-smell. Now I could be convinced to change my mind by a sufficiently compelling set of use-cases, but so far they've been disappointingly weak to my mind. I also think that we don't yet have a good design for what it should do. Is the intent to make an assertion about program logic, as Brandt says? In this case, it should raise AssertionError and it should be disabled when other assertions are disabled. Or is the intent to have an exception we intend to catch and (somehow?) recover from? In this case, the most likely exception is ValueError. I know some people don't care what exception code raises. I've seen lots of people raise AssertionError for bad user data, or missing files. I've seen people raise TypeError for things that have nothing to do with types. But for me, chosing the right exception type and behaviour is important. It's about communicating intent. [...]
* Many uses (most?) do expect the iterators to be of equal length. - The main exception to this may be when one of them is infinite, but how common is that, really?
Common and useful! Really. But plain old zip isn't going to go away, so let's leave this. [...]
So: if this were added, it would get some use. How much? hard to know. Is it critically important? absolute not. But it's fully backward compatible and not a language change, the barrier to entry is not all that high.
Of course it's a language change. If we add this to zip, other Python interpreters will have to follow once they catch up to version 3.9 or 3.10.
However, I agree with (I think Brandt) in that the lack of a critical need means that a zip_strict() in itertools would get a LOT less use than a flag on zip itself
So you and Brandt are arguing that the *less* useful this is, the more we should prefer to make it a builtin? For everything else, it goes the other way: aside from maybe the odd builtin left over from Python 1.0, things become builtin only if they are *more* useful, not less. -- Steven