[Python-ideas] Re: zip(x, y, z, strict=True)

26 Apr 2020

      On Sat, Apr 25, 2020 at 09:39:14AM -0700, Christopher Barker wrote:
[...]
...
...
File handling code ought to be resilient in the face of such meaningless
differences,
sure. But what difference is "meaningless" depends on the use case. For
instance, comments or blank lines in the middle of a file may be a
meaningless difference. And you'd want to handle that before zipping
anyway.
Probably. But once you get past that simple zip(*files) pattern, the 
whole processing loop becomes more complex and it isn't so obvious that 
you'll end up using zip *at all* let alone the proposed strict version.

[...]
...
...
So my argument is that anything you want zip_strict for is better
handled with zip_longest -- including the case of just raising.
That is quite the leap! You make a decent case about handling empty lines
in files, but extending that to "anything" is unwarranted.
Okay, fair point -- I should have said "just as well or better" rather 
than just better. And it's not an unwarranted leap, because you can 
easily implement zip_strict from zip_longest.

zip_longest provides all the functionality of zip_strict, plus more:

* zip_strict can *only* raise if there is a mismatch;

* zip_longest can raise, or truncate, or pad the data with a default; 
  you can transform the short data in any way you want.

* A few days ago, I needed a version of zip that simply ignored missing
  values:

      zip_shrinking(['ab', '1234', 'xyz'])
      --> a1x b2y 3z 4

  and I knocked up one using zip_longest in about thirty seconds.

If we could only have one version of zip, it would be a no-brainer to 
choose zip_longest.
...
I honestly do not understand the resistance here. Yes, any change to the
standard library should be carefully considered, and any change IS a
disruption, and this proposed change may not be worth it. But arguing that
it wouldn't ever be useful, I jsut don't get.
I am sorry if I have given you the impression that I believe that there 
is *never* any reason to validate equal lengths. That is not my 
position, and I apologise if I said anything that gave you that idea.

Of course I don't believe that there is no code anywhere in the world 
that could make use of a zip_strict, that would be silly, but I do have 
serious doubts that the use-cases are all three of sufficiently 
important, common, and performance sensitive to justify making it a 
builtin.

We have five options here:

- Status quo wins a draw: do nothing.
- Add a new builtin.
- Add a flag to zip().
- Add a zip_strict to itertools.
- Add a recipe to itertools showing how to do it.

If Raymond agrees, I wouldn't oppose a version in itertools, even though 
I have my doubts about its usefulness and I think that it will more 
often be an attractive nuisance than an actual help. But the barrier to 
entry for the stdlib is lower than for builtins.

I also dislike the proposed builtin API: bool flag arguments are, in 
my opinion, a code-smell.

Now I could be convinced to change my mind by a sufficiently compelling 
set of use-cases, but so far they've been disappointingly weak to my 
mind.

I also think that we don't yet have a good design for what it should do. 
Is the intent to make an assertion about program logic, as Brandt says? 
In this case, it should raise AssertionError and it should be disabled 
when other assertions are disabled.

Or is the intent to have an exception we intend to catch and (somehow?) 
recover from? In this case, the most likely exception is ValueError.

I know some people don't care what exception code raises. I've seen lots 
of people raise AssertionError for bad user data, or missing files. I've 
seen people raise TypeError for things that have nothing to do with 
types. But for me, chosing the right exception type and behaviour is 
important. It's about communicating intent.

[...]
...
* Many uses (most?) do expect the iterators to be of equal length.
  - The main exception to this may be when one of them is infinite, but how
common is that, really?
Common and useful! Really.

But plain old zip isn't going to go away, so let's leave this.

[...]
...
So: if this were added, it would get some use. How much? hard to know. Is
it critically important? absolute not. But it's fully backward compatible
and not a language change, the barrier to entry is not all that high.
Of course it's a language change. If we add this to zip, other Python 
interpreters will have to follow once they catch up to version 3.9 or 
3.10.
...
However, I agree with (I think Brandt) in that the lack of a critical need
means that a zip_strict() in itertools would get a LOT less use than a flag
on zip itself
So you and Brandt are arguing that the *less* useful this is, the more 
we should prefer to make it a builtin?

For everything else, it goes the other way: aside from maybe the odd 
builtin left over from Python 1.0, things become builtin only if they 
are *more* useful, not less.

-- 
Steven

[Python-ideas] Re: zip(x, y, z, strict=True)

Steven D'Aprano