Fwd: Loop manager syntax

On Jul 29, 2015, at 13:16, Todd <toddrjen@gmail.com> wrote:
I thought it was better to first determine whether the idea had any merit at all before getting into details. If the idea turned out to be stupid then there isn't much point getting into details.
I can't speak for anyone else, but I don't think the idea itself is stupid, or I wouldn't have responded at all. After all, it's not far from the way you do parallel loops via OMP in C, or in Cython.
Meanwhile, if mutating the dicts (that is, the inner function assigning to variables) doesn't affect the outer namespace, what about mutating the values in the dict? Do you deep-copy everything into the dicts, then "deep-update" back out later? If not, how does a line like "d[k] += 1" inside the loop end up working at all (where d is a global or nonlocal)?
The contents of the dict are deep-copied when passed to the loop manager. Whether the loop manager deep copies those or not is up to it.
Well, if it's already deep-copied once, then it doesn't really matter if you deep-copy it again; it's still separate objects. That means you can't use any variables in a loop whose values aren't deep-copyable (=picklable), or you'll get some kind of exception. It also means, as I pointed out above, that you need some kind of deep-update mechanism to get all the values back into the parent scope, not just the ones directly in variables. Since Python doesn't have any such mechanism, you have to invent one. And I don't know if there is any sensible algorithm for such a thing in general. (Maybe something where you version or timestamp all the values, but even that fails the i+=1 case, which is why it's not sufficient for STM without also adding rollback and retryable commits.) And meanwhile, it means you don't have the option of delayed vs. live updates as you suggested, it means you have the option of delayed vs. no updates at all.
Inside the loop, for example. That presumably means you've modified the deep copy. But, assuming closures can be deep-copied at all, they'll presumably have independent copies, not copies still linked to the variables you have (otherwise they're not deep copies). Which would be very different from normal semantics. For example: def spam(x): def adder(y): return x+y for i in range(2): x=10; print(adder(i)) spam(0) This is obviously silly code, but it's the shortest example I could think of on the spur of the moment that demonstrates most of the problems. This prints 10 then 11. But if you change it to "eggs for i in range(2):", there is no way that any eggs could be implemented (with your design) that gives you the same 10 and 11, because adder is going to be deep-copied with x=0, not x as a closure cell. Or, even if x _is_ a closure cell somehow, it's not the same thing as the x in the enclosing dict passed into the function, so assigning x=10 still doesn't affect it. Again, this is a silly toy example, but it demonstrates real problems that treating scopes as just dictionaries, and deep-copying them, create.
And finally, how does this copy-and-merge work in parallel? Let's say you do x=0 outside the loop, then inside the loop you do x+=i. So, the first iteration starts with x=0, changes it to x=1, and you merge back x=1. The second iteration starts with x=0, changes it to x=2, and you merge back x=2. This seems like a guaranteed race, with no way either the interpreter or the Pool.__for__ mechanism could even have a possibility of ending up with 3, much less guarantee it. I think what you actually need here is not copied and merged scopes, but something more like STM, which is a lot more complicated.
Again, this would be up to the loop handler. In the case of multiprocessing, each repetition will likely get x=0, and the last value (whatever that is) will be returned at the end. The documentation would need to say that the behavior in such cases is non-deterministic and shouldn't be counted on.
But that makes parallel for basically useless unless you're doing purely-immutable code. If that's fine, you might as well not try to make mutability work in the first place, in which case you might as well just do comprehensions, in which case you might as well just call the map method. A design that's much more complicated, but still doesn't add any functionality in a usable way for the paradigm use case, doesn't seem worth it. I think you _could_ come up with something that actually works by exposing a bunch of new features for dealing with cells and frames from Python, which could lead to other cool functionality (again, like pure-Python greenlets), which I think may be worth exploring.
Can you sketch out the API of what creating and calling these function-like objects looks like? It can't be just like defining and calling a function. Extracting the code object from a function (or compiling one) and calling exec with it are closer, but still not sufficient, since function objects, and exec, only have locals and globals from the Python side (to deal with nonlocals you have to drop down to C), and since the mechanism to update the outer scope doesn't exist (without dropping down to C). But if you can look at what's already there and show what you'd need to add, even if you can't work out how to implement it as a CPython patch or anything, that could still be a very useful idea.
OK, that makes sense.
Generator objects have not only __next__ and send, but also throw.
More importantly, if you think about this, you're proposing to have something like explicit generator states and/or continuations that can be manually passed around and called (and even constructed?). Together with your manual scopes (which can at least be merged into real scopes, if not allowing real scopes to be constructed), this gives you all kinds of cool things--I think you could write Stackless or greenlets in pure Python if you had this. But again, this is something we definitely don't have today, that has to be designed before you can just assume it for a new feature.
I considered that possibility, but it seemed overly complicated. So at least in my proposal right now there is no way to get at the generator state.
But if there's no way to get at the generator state, I don't see how you can implement what you want. What is the method holding? How does it know whether it called a function or a generator, and whether it needs to act as a function or a generator?
Sure, but the point is that compiling any function that has a custom loop has to switch to a new mode that had to do various things differently than just compiling a suite, or an explicit function, or a comprehension.
Yielding from inside a loop is a very common thing to do, so I think any proposal had better be able to handle it. The problem is that whether a function is a generator function or not is determined at compile time, not run time. A __for__ method can't know at compile time whether the loop suites (function-like objects) that will be passed to it will yield or not, or whether the functions that will call it will yield or not. (Especially since the same method may get called once with a suite that yields, and once with a suite that doesn't.) So, how can you write something that works? Either your method isn't a generator function, and therefore it can't do anything useful if the function-like object gives it a yield value, or it is, in which case you have the opposite problem. Or you need some new way to implement the generator protocol in a way that's explicit (i.e., doesn't actually use generators) in the middle layers but transparent at the compiler level and then again at the level of calling the function with the loop in it. Providing explicit generator state objects isn't the only way to solve the problem, but it is _a_ way to solve it, and it could be useful elsewhere. If you have something simpler than works, it may be worth doing that something simpler instead, but I can't think of anything.
And, again, _how_ can it handle yielding if it decides to do so? You can't write a method that's sometimes a generator function and sometimes not (depending on whether some function-like object returns a special value or not).
There are a couple of ways I thought of to deal with this. One is that there are two versions of each loop handler, one for generators and one for regular loops. The other is that loop handler can be passed an object that it would push values to yield.
The latter would be very weird, because generators are driven from the outside, pull-based, and the inner function-like object is also going to be pull-based; putting a push-based interface in between will be misleading or confusing at best.
But it's not two or three lines of code. I'm not even sure it's doable at all, e.g., because, again, assignments are handled mostly at compile time, not run time.
Also, you realize that, even if it uses exactly the same rules as the normal interpreter, the effect will be different, because the interpreter applies the rule partly at compile time, not completely dynamically, right?
Yes, if someone is using this feature they would need to be aware of the effects it has.
Having loop modifiers that break all kinds of standard loop features and change the basic semantics of variable lookup doesn't seem like a good addition to the language. If it were _possible_ to do things right, but also possible to do them weirdly, you could just file that under "consenting adults", but if any loop manager that can be written is guaranteed to break a large chunk of Python and there's no way around that, I don't think it's a feature anyone would want.
What use are you envisioning for taking a dict and returning a different dict, other than a deep copy, and a subsequent deep update (and who does that deep update if not the loop manager)?
It's going to walk the whole chain of everything reachable from the dict? I'm not sure how you even do that in general (you could make assumptions about how the copy protocol is implemented for each type, but if those assumptions were good enough, we wouldn't have the whole copy protocol in the first place...). And what is it comparing them to? If it's got deep copies--which it will--everything is guaranteed to not the same (except maybe for ints and other simple immutable objects that the interpreter is allowed to collapse, but in that case it's just a coincidence).
But it's not really similar to decorators either. A decorator is just like any normal higher-order function. It takes a function, it returns a function, that's it. No monkeying with scopes or capturing and resuming yields or anything of the sort you're proposing. And that's my point: you seem to think your proposal is just a simple thing that will fit on top of Python 3.5, but it actually relies on a whole slew of new things that still need to be invented before it makes sense. Or, to put it another way, if you intentionally designed a proposal to demonstrate why Scheme continuations and environments are cooler than anything Python has, I think it would look a lot like this... (And the usual counter is that Python doesn't have everything to let you implement custom control flow at the language level because allowing that is actually a _bad_ thing for readability, not a feature.)
participants (1)
-
Andrew Barnert