On 2020-04-28 12:28 a.m., Andrew Barnert wrote:
On Apr 27, 2020, at 17:01, Soni L. <fakedme+py@gmail.com> wrote:
On 2020-04-27 8:37 p.m., Andrew Barnert wrote: On Apr 27, 2020, at 14:38, Soni L. <fakedme+py@gmail.com> wrote: [snipping a long unanswered reply] The explicit case for zip is if you *don't* want it to consume anything after the stop. Sure, but *when do you want that*? What’s an example of code you want to write that would be more readable, or easier to write, or whatever, if you could work around consuming anything after the stop?
so here's one example, let's say you want to iterate multiple things (like with zip), get a count out of it, as well as partially consume an external iterator without swallowing any extra values from it.
What do you want to do that for? This still isn’t a concrete use case, so it’s still not much more of a rationale than “let’s say you want to intermingle the bits of two 16-bit integers into a 32-bit integer”. Sure, that’s something that’s easy to do in some other languages (it’s the builtin $ operator in INTERCAL) but very hard to do readably or efficiently in Python. If we added a $ operator with a __bigmoney__ protocol and made int.__bigmoney__ implement this operation in C, that would definitely solve the problem. But it’s only worth proposing that solution if anyone actually needs a solution to the problem in the first place. When’s the last time anyone ever needed to efficiently intermingle bits? (Except in INTERCAL, where the language intentionally leaves out useful operators like +, |, and << and even 32-bit literals to force you to write things in clever ways around $ and ~ instead).
(OT: Z-order curves. they're amazing.)
On top of that, this abstract example you want can already be written today.
it'd look something like this:
def foo(self, other_things): for x in zip(range(sys.maxsize), self.my_things, other_things): do_stuff else as y: return y[0] # count
using extended for-else + partial-zip. it stops as soon as self.my_things stops. and then the caller can do whatever else it needs with other_things. (altho maybe it's considered unpythonic to reuse iterators like this? I like it tho.)
Here are four ways of doing this today:
def foo(self, other_things): for x in zip(count(1), self.my_things, other_things): do_stuff return x[0]
def foo(self, other_things): c = count(-1) for x in zip(c, self.my_things, other_things): do_stuff return next(c)
def foo(self, other_things): c = count() for x in zip(self.my_things, other_things, c): do_stuff return next(c)
def foo(self, other_things): c = lastable(count()) for x in zip(c, self.my_things, other_things): do_stuff return c.last
So, why do we need another way to do something that’s probably pretty uncommon and can already be done pretty easily? Especially if that new way isn’t more readable or more powerful?
the only one with equivalent semantics is the last one.
if anything my motivating example is because I wanna do some very unpythonic things.
Then you should have given that example in the first place.
Sure, the fact that it’s unpythonic might mean it’s not very convincing, but it doesn’t become more convincing after multiple people have to go back and forth to drag it out of you. All that means is that everyone else has already tuned out and won’t even see your example, so your proposal has basically zero chance instead of whatever chance it should have had.
And sometimes unpythonic things really do get into the language—sometimes because they’re just so useful, but more often, because they point to a reason for changing what everyone’s definition of “pythonic” is. Think of the abc module. Or, better, if you can dig up the 3.1-era vs. 3.3-era threads on the original coroutine PEP 3152, you can see how the consensus changed from “wtf, that doesn’t look like Python at all and nobody will ever understand it” to “this is obviously the pythonic way to write reactors (modulo a bunch of bikeshedding)”. That wouldn’t have happened if Greg Ewing had refused to tell anyone that he wanted coroutines to provide a better, if unfamiliar, way to write things like reactors, and instead tried to come up with less-unpythonic-looking but completely useless examples.
tbh my particular case doesn't make a ton of practical sense. I have config files and there may be errors opening or deserializing them, and I have a system to manage configs and overrides. which means you can have multiple config files, and you may wanna log errors. you can also use a config manager as a config file in another config manager, which is where the error logging gets a bit weirder. I'm currently returning lists of errors, but another option would be to yield the errors instead. but, I'm actually not sure what the best approach here is. so yeah. I can't *use* that motivating example, because if I did, everyone would dismiss me as crazy. (which I am, but please don't dismiss me based on that :/)
That grouping idiom is useful for all kinds of things that _aren’t_ about optimization. Maybe the zip docs aren’t the best place for it (but it’s also in the itertools recipes, which probably is the best place for it), but it’s definitely useful. In fact, I used it less than a week ago. We’ve got this tool that writes a bunch of 4-line files, and someone concatenated a bunch of them together and wrote this horrible code to pull them back apart in another language I won’t mention here, and rather than debug their code, I just rewrote it in Python like this: with open(path) as f: for entry in chunkify(f, 4): process(entry) I used a function called chunkify because I think that’s a lot easier to understand (especially for colleagues who don’t use Python very often), and we already had it lying around in a utils module, but it’s just implemented as zip(*[iter(it)]*n).
see: why are we perfectly happy with ignoring extra lines at the end?
Because there aren’t any. The file was made by catting together 2022 4-line files, so it’s 8088 lines long. It will always be 8088 lines long. If I really thought that was important to check, surely I’d want to check 8088 rather than just divisible by 4. But I didn’t think it was worth checking either of those—or that the text is pure ASCII, or that the newlines are \n, etc. For a more general purpose script (especially if it had to accept input from potentially stupid or malicious end users and produce useful error responses instead of just punting), I would have checked many of those things and more, but for this script, it wasn’t worth it.
that's what assert is for - making assumptions that you know are correct now, but might not remain so in the future!
an "else" would serve you well, even if it's just to "assert len(remaining) == 0". but we can't do that, can we? because zip swallows the extras. :/
Sure we can, because the exact same grouper idiom works just as well with zip_equal (which is available in more-itertools, and really easy to write yourself around zip_longest, even if the other thread attempting to add it to the stdlib fails) or zip_longest as with zip. If you understand the grouper idiom, the question “how do I check for a leftover partial group” is just obviously the same question as it is with every other use of zip. So if I wanted to check for exact multiples of 4, I would just use the same code but with zip_equal instead of zip (or wrap that up in a chunkify_equal and use that instead of chunkify), and it would raise a ValueError if there were leftover extras instead of swallowing them.
So there’s no need for a new language feature here. An explicit test and assert that adds 2 lines of boilerplate to a 3-line function and obscured the main point of the function would be a worse solution than the one I can already write today.
Even if you think Python should be doing more to encourage such checks, your proposal doesn’t help that at all—what you want is something like Serhiy’s proposal in the other thread (to eventually rename zip to zip_shortest and either get rid of plain zip or make it an alias for zip_equal).
... why not? I know assert is discouraged by many, but I wouldn't say enabling ppl to do these checks doesn't help ppl do these checks...? unless I misunderstand what you mean by this?
Also, compare this other example for processing a different file format: with open(path) as f: for entry in split(f, '\n'): process(entry) It’s pretty obvious what the difference is here: one is reading entries that are groups of 4 lines; the other is reading entries that are groups of arbitrary numbers of lines but separated by blank lines. At most you might need to look at the help for chunkify and split to be absolutely sure they mean what you think they mean. (Although maybe I should have used functions from more-itertools rather than our own custom functions that do effectively the same thing but are kind of weird and probably not so well tested and whose names don’t come up in a web search.)
and... well I'm assuming this one just yields the extras at the end of the file/iterator?
No, because in this case it’s not even theoretically possible for there to be extras. It’s like asking what happens to the extra characters in str.split. By definition, there aren’t any—the last element is everything after the last separator, and there can never be anything left over after everything.
that's what I was asking :p a naive implementation would just collect things and yield on separator. if yours also yields the extras on StopIteration then it's fine.