Fwd: Loop manager syntax
On Jul 29, 2015, at 13:16, Todd <toddrjen@gmail.com> wrote:
I thought it was better to first determine whether the idea had any merit at all before getting into details. If the idea turned out to be stupid then there isn't much point getting into details.
I can't speak for anyone else, but I don't think the idea itself is stupid, or I wouldn't have responded at all. After all, it's not far from the way you do parallel loops via OMP in C, or in Cython.
Meanwhile, if mutating the dicts (that is, the inner function assigning to variables) doesn't affect the outer namespace, what about mutating the values in the dict? Do you deep-copy everything into the dicts, then "deep-update" back out later? If not, how does a line like "d[k] += 1" inside the loop end up working at all (where d is a global or nonlocal)?
The contents of the dict are deep-copied when passed to the loop manager. Whether the loop manager deep copies those or not is up to it.
Well, if it's already deep-copied once, then it doesn't really matter if you deep-copy it again; it's still separate objects. That means you can't use any variables in a loop whose values aren't deep-copyable (=picklable), or you'll get some kind of exception. It also means, as I pointed out above, that you need some kind of deep-update mechanism to get all the values back into the parent scope, not just the ones directly in variables. Since Python doesn't have any such mechanism, you have to invent one. And I don't know if there is any sensible algorithm for such a thing in general. (Maybe something where you version or timestamp all the values, but even that fails the i+=1 case, which is why it's not sufficient for STM without also adding rollback and retryable commits.) And meanwhile, it means you don't have the option of delayed vs. live updates as you suggested, it means you have the option of delayed vs. no updates at all.
It may be possible to make the dicts directly represent the namespaces, and thus have changes immediately reflected in the namespace. I thought keeping things well-isolated was more important, but it may not be worth the trouble.
Meanwhile, this scope semantic is very weird if it interacts with the rest of Python. For example, if you call a closure that was built outside, but you've modified one of the variables that closure uses, it'll be called the with old value, right? That would be more than a little confusing.
Where and when did you modify it?
Inside the loop, for example. That presumably means you've modified the deep copy. But, assuming closures can be deep-copied at all, they'll presumably have independent copies, not copies still linked to the variables you have (otherwise they're not deep copies). Which would be very different from normal semantics. For example: def spam(x): def adder(y): return x+y for i in range(2): x=10; print(adder(i)) spam(0) This is obviously silly code, but it's the shortest example I could think of on the spur of the moment that demonstrates most of the problems. This prints 10 then 11. But if you change it to "eggs for i in range(2):", there is no way that any eggs could be implemented (with your design) that gives you the same 10 and 11, because adder is going to be deep-copied with x=0, not x as a closure cell. Or, even if x _is_ a closure cell somehow, it's not the same thing as the x in the enclosing dict passed into the function, so assigning x=10 still doesn't affect it. Again, this is a silly toy example, but it demonstrates real problems that treating scopes as just dictionaries, and deep-copying them, create.
And finally, how does this copy-and-merge work in parallel? Let's say you do x=0 outside the loop, then inside the loop you do x+=i. So, the first iteration starts with x=0, changes it to x=1, and you merge back x=1. The second iteration starts with x=0, changes it to x=2, and you merge back x=2. This seems like a guaranteed race, with no way either the interpreter or the Pool.__for__ mechanism could even have a possibility of ending up with 3, much less guarantee it. I think what you actually need here is not copied and merged scopes, but something more like STM, which is a lot more complicated.
Again, this would be up to the loop handler. In the case of multiprocessing, each repetition will likely get x=0, and the last value (whatever that is) will be returned at the end. The documentation would need to say that the behavior in such cases is non-deterministic and shouldn't be counted on.
But that makes parallel for basically useless unless you're doing purely-immutable code. If that's fine, you might as well not try to make mutability work in the first place, in which case you might as well just do comprehensions, in which case you might as well just call the map method. A design that's much more complicated, but still doesn't add any functionality in a usable way for the paradigm use case, doesn't seem worth it. I think you _could_ come up with something that actually works by exposing a bunch of new features for dealing with cells and frames from Python, which could lead to other cool functionality (again, like pure-Python greenlets), which I think may be worth exploring.
In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces.
The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end.
What does this "function-like object" look like? How do you call it with these dicts in such a way that their locals, nonlocals, and globals get set up as the variables the function was compiled to expect? (For just locals and globals, you can get the code object out, and exec that instead of calling the function, as I suggested--but you clearly don't want that, so what do you want instead?) Again, this is a new feature that I think would be more complicated, and more broadly useful, than your suggested feature, so it ought to be specified.
I am not sure I am understanding the question. Can you please explain in more detail?
Can you sketch out the API of what creating and calling these function-like objects looks like? It can't be just like defining and calling a function. Extracting the code object from a function (or compiling one) and calling exec with it are closer, but still not sufficient, since function objects, and exec, only have locals and globals from the Python side (to deal with nonlocals you have to drop down to C), and since the mechanism to update the outer scope doesn't exist (without dropping down to C). But if you can look at what's already there and show what you'd need to add, even if you can't work out how to implement it as a CPython patch or anything, that could still be a very useful idea.
In the case of yield, the returned tuple will have one additional element for the yielded value.
How you do distinguish between "exited normally" and "yielded None"?
The first would return a length-3 tuple and the second would return a length-4 tuple with the last element being "None".
OK, that makes sense.
The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception.
Well, you also need to handle throwing exceptions into the function-like object.
I am not sure what you mean by this.
Generator objects have not only __next__ and send, but also throw.
More importantly, if you think about this, you're proposing to have something like explicit generator states and/or continuations that can be manually passed around and called (and even constructed?). Together with your manual scopes (which can at least be merged into real scopes, if not allowing real scopes to be constructed), this gives you all kinds of cool things--I think you could write Stackless or greenlets in pure Python if you had this. But again, this is something we definitely don't have today, that has to be designed before you can just assume it for a new feature.
I considered that possibility, but it seemed overly complicated. So at least in my proposal right now there is no way to get at the generator state.
But if there's no way to get at the generator state, I don't see how you can implement what you want. What is the method holding? How does it know whether it called a function or a generator, and whether it needs to act as a function or a generator?
Returns and breaks will be exceptions, which contain the namespaces as extra data.
Which, as I said, obviously means that you've created a new compiler mode that, among other things, compiled returns and breaks into raises.
Correct, which is why I call this a "function-like object" rather than a "function".
Sure, but the point is that compiling any function that has a custom loop has to switch to a new mode that had to do various things differently than just compiling a suite, or an explicit function, or a comprehension.
How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense.
But how does the class know whether, e.g., yielding from the calling function makes sense? It doesn't know what function(s) it's going to be called from.
I am not understanding the question. Either there is a sane way to deal with yields, in which case it would do so, or there isn't, in which case it wouldn't. Can you name a situation where allowing a yield would be sane in some cases but not in others?
Yielding from inside a loop is a very common thing to do, so I think any proposal had better be able to handle it. The problem is that whether a function is a generator function or not is determined at compile time, not run time. A __for__ method can't know at compile time whether the loop suites (function-like objects) that will be passed to it will yield or not, or whether the functions that will call it will yield or not. (Especially since the same method may get called once with a suite that yields, and once with a suite that doesn't.) So, how can you write something that works? Either your method isn't a generator function, and therefore it can't do anything useful if the function-like object gives it a yield value, or it is, in which case you have the opposite problem. Or you need some new way to implement the generator protocol in a way that's explicit (i.e., doesn't actually use generators) in the middle layers but transparent at the compiler level and then again at the level of calling the function with the loop in it. Providing explicit generator state objects isn't the only way to solve the problem, but it is _a_ way to solve it, and it could be useful elsewhere. If you have something simpler than works, it may be worth doing that something simpler instead, but I can't think of anything.
And, again, _how_ can it handle yielding if it decides to do so? You can't write a method that's sometimes a generator function and sometimes not (depending on whether some function-like object returns a special value or not).
There are a couple of ways I thought of to deal with this. One is that there are two versions of each loop handler, one for generators and one for regular loops. The other is that loop handler can be passed an object that it would push values to yield.
The latter would be very weird, because generators are driven from the outside, pull-based, and the inner function-like object is also going to be pull-based; putting a push-based interface in between will be misleading or confusing at best.
While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional.
And it has to manually do the LEGB rule to dynamically figure out where to look up those names?
If it doesn't want to alter them, then no it just passes along the namespaces to the function-like object. If it wants to alter them, then yes. That is, what, two or three lines of code?
But it's not two or three lines of code. I'm not even sure it's doable at all, e.g., because, again, assignments are handled mostly at compile time, not run time.
Also, you realize that, even if it uses exactly the same rules as the normal interpreter, the effect will be different, because the interpreter applies the rule partly at compile time, not completely dynamically, right?
Yes, if someone is using this feature they would need to be aware of the effects it has.
Having loop modifiers that break all kinds of standard loop features and change the basic semantics of variable lookup doesn't seem like a good addition to the language. If it were _possible_ to do things right, but also possible to do them weirdly, you could just file that under "consenting adults", but if any loop manager that can be written is guaranteed to break a large chunk of Python and there's no way around that, I don't think it's a feature anyone would want.
This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace.
Why is it returning a namespace dict? An expression can't assign to a variable, so the namespace will always be identical.
An expression can, of course, call a mutating method, but unless you're suggesting another deep-copy/deep-update (or STM transaction) here, the value is already going to be mutated, and the dict won't help anyway.
It is up to the loop manager whether to deep copy or not. It would, of course, be possible for the loop manager to determine whether anything has mutated, but I felt this approach would be more consistent.
What use are you envisioning for taking a dict and returning a different dict, other than a deep copy, and a subsequent deep update (and who does that deep update if not the loop manager)?
And, beyond the problems with concurrency, you have cross-process problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope?
That is the whole point of passing namespace dicts around.
OK, think about this:
I have a list a=[0]*8.
Now, for i in range(8), I set a[i] = i**2.
If I do that with a pool for, each instance gets its own copy of the scope with its own copy of a, which is modifies. How does the interpreter or the Pool.__for__ merge those 8 separate copies of a back in those 8 separate namespaces back into the original namespace?
It would need to check if any of the elements are not the same as before (using "is").
It's going to walk the whole chain of everything reachable from the dict? I'm not sure how you even do that in general (you could make assumptions about how the copy protocol is implemented for each type, but if those assumptions were good enough, we wouldn't have the whole copy protocol in the first place...). And what is it comparing them to? If it's got deep copies--which it will--everything is guaranteed to not the same (except maybe for ints and other simple immutable objects that the interpreter is allowed to collapse, but in that case it's just a coincidence).
However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality:
If someone can come up with a clean way to write this do object (even ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way?
It would be easier because it can be uglier. The bar for new statements is necessarily much, much, much higher than for third-party packages. I certainly wouldn't propose loop handlers solely or even primarily to allow do...while loops, this is more of a side benefit and an example of the sorts of variations on existing loop behaviour that would be possible.
I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important.
The major difference between this proposal and context managers is that you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all.
Yes, that is why I said it was similar "in principle". The implementation is different, but I think the concepts have a lot in common.
But my point is that they're very different even in principle. One just supplies functions to get called around the suite, the other tries to control the way the suite is executed, using some kind of novel mechanism that still hasn't been designed. Perhaps it is better to say this is more similar to decorators, then.
But it's not really similar to decorators either. A decorator is just like any normal higher-order function. It takes a function, it returns a function, that's it. No monkeying with scopes or capturing and resuming yields or anything of the sort you're proposing. And that's my point: you seem to think your proposal is just a simple thing that will fit on top of Python 3.5, but it actually relies on a whole slew of new things that still need to be invented before it makes sense. Or, to put it another way, if you intentionally designed a proposal to demonstrate why Scheme continuations and environments are cooler than anything Python has, I think it would look a lot like this... (And the usual counter is that Python doesn't have everything to let you implement custom control flow at the language level because allowing that is actually a _bad_ thing for readability, not a feature.)
participants (1)
-
Andrew Barnert