On Fri, Apr 24, 2020 at 06:07:06PM -0000, Brandt Bucher wrote:
1. Likely the most common case, for me, is when I have some data and want to iterate over both it and a calculated pairing:
x = ["a", "b", "c", "d"] y = iter_apply_some_transformation(x) for a, b in zip(x, y): ... ... # Do something. ...
This can be extrapolated to many more cases, where x and/or y are constants, iterators, calculated from each other, calculated individually, passed as arguments, etc.
It won't work with iterators unless you use tee(). py> x = iter('abc') py> y = map(str.upper, x) py> for t in zip(x, y): ... print(*t) ... a B
I've written most of them in production code, and in every case, mismatched lengths are logic errors that should "never" happen.
Sounds like this ought to be an assertion that can be disabled. And ValueError is, semantically, the wrong exception: it's a *logic error* in your code, as you say, so it ought to be AssertionError. People who know me, or read my code, know that I love `assert`. I use it a lot. According to some people, too much. But even I would struggle to justify using an assert for the code snippet you have above: x = ["a", "b", "c", "d"] y = iter_apply_some_transformation(x) # Figurative, not literal: assert len(x) == len(y) I suppose it might be justified to put a post-condition into `iter_apply_some_transformation` to check that it returns the same number of items that it had been fed in, and if it were a complex transformation that might be justified. But for a straight-forward map-like transformation: def iter_apply_some_transformation(x): for item in x: yield something(x) then it is *obviously true* that the length of the output is the length of the input and you don't need an assertion to check it. This is especially obvious when written as a comprehension: x = ["a", "b", "c", "d"] y = (transform(z) for z in x) So I think this is a weak example. It might justify a function in itertools, which could be a simple wrapper around zip_longest, but not a flag argument on builtin zip.
2. This is less-well-known, but you can lazily unzip/"transpose" nested iterables by unpacking into `zip`. I've seen it suggested many times on StackOverflow:
x = iter((iter((0, 1, 2)), iter((3, 4, 5)), iter((6, 7, 8)))) y = zip(*x) tuple(y) ((0, 3, 6), (1, 4, 7), (2, 5, 8))
Yes, that's a moderately common functional idiom, unzip(). It's also sometimes called demuxing.
It's clearly a logic error if one of the tuples in `x` is longer/shorter than the others, but this move would silently toss the data instead.
It's not clear to me that it is a logic error rather than bad data (the caller's responsibility, not zip's). And if it's bad data, then truncating the data is at least as reasonable an approach as killing the application. (In my opinion, a much better approach.)
3. Just to show that this has visible effects in the stdlib: below is the AST equivalent of `eval("{'KEY WITH NO VALUE': }")`. The use of `zip` to implement `ast.literal_eval` silently throws away the bad key, instead of complaining with a `ValueError` (as it typically does for malformed or invalid input).
from ast import Constant, Dict, literal_eval malformed = Dict(keys=[Constant("KEY WITH NO VALUE")], values=[]) literal_eval(malformed) {}
Okay, that's a good example. I too expect that it ought to complain rather than silently drop malformed code. -- Steven