Re: [Python-ideas] x=(yield from) confusion [was:Yet another alternative name for yield-from]
Jacob Holm wrote:
I am saying that there are examples where it is desirable to move one of the arguments that this form of refactoring forces you to put in the constructor so it instead becomes the argument of the first send.
I'm having trouble seeing circumstances in which you would need to do that. Can you provide an example in the form of (a) a piece of unfactored code (b) a desired refactoring (c) an explanation of why the desired refactoring can't conveniently be done using an unprimed generator and plain yield-from. -- Greg
Greg Ewing wrote:
Jacob Holm wrote:
I am saying that there are examples where it is desirable to move one of the arguments that this form of refactoring forces you to put in the constructor so it instead becomes the argument of the first send.
I'm having trouble seeing circumstances in which you would need to do that. Can you provide an example in the form of
(a) a piece of unfactored code
Ok, once again based on your own parser example. The parse_items generator could have been written as: def parse_items(closing_tag = None): elems = [] token = yield while token != closing_tag: if is_opening_tag(token): name = token[1:-1] items = yield from parse_items("</%s>" % name) elems.append((name, items)) else: elems.append(token) token = yield return elems
(b) a desired refactoring
I would like to split off a function for parsing a single element. And I would like it to look like this: def parse_elem(): opening_tag = yield name = opening_tag[1:-1] items = yield from parse_items("</%s>" % name) return (name, items) This differs from the version in your example by taking all the tags as arguments to send() instead of having the opening tag as an argument to the constructor. Unfortunately, there is no way to actually use this version in the implementation of parse_items.
(c) an explanation of why the desired refactoring can't conveniently be done using an unprimed generator and plain yield-from.
The suggested subroutine cannot be used, because parse_items already has the value that should go as the argument to the first send(). It is easy to rewrite it to the version you used in the example, but that requires you to make the opening_tag an argument to the constructor, whereas I want it as an argument to the first send. You can of course make that argument optional and adjust the function to only do the first yield if the argument is not given. That is essentially what my "cr_init()" pattern does. Using that pattern, the refactoring looks like this: def cr_init(start): if start is None: return yield if 'send' in start: return start['send'] if 'throw' in start: raise start['throw'] return yield start.get('yield') def parse_elem(start=None): opening_tag = yield from cr_init(start) name = opening_tag[1:-1] items = yield from parse_items("</%s>" % name) return (name, items) def parse_items(closing_tag=None, start=None): elems = [] token = yield from cr_init(start) while token != closing_tag: if is_opening_tag(token): elems.append(yield from parse_elem(start={'send':token})) else: elems.append(token) token = yield return elems As you see it *can* be done, but I would hardly call it convenient. The main problem is that the coroutine you want to call must be written with this in mind or you are out of luck. While it *is* possible to write a wrapper that lets you call the unmodified parse_elem, that wrapper cannot use yield_from to call it so you get a rather large overhead that way. A convention like Nick suggested where all coroutines take an optional "start" argument with the first value to yield doesn't help, because it is not the value to yield that is the problem. I hope this helps to explain why the cr_init pattern is needed even for relatively simple refactoring now that it seems we are not fixing the "initial next()" issue. - Jacob
Jacob Holm wrote:
I would like to split off a function for parsing a single element. And I would like it to look like this:
def parse_elem(): opening_tag = yield name = opening_tag[1:-1] items = yield from parse_items("</%s>" % name) return (name, items)
I don't see what you gain by writing it like that, though. You don't even know whether you want to call this function until you've seen the first token and realized that it's a tag. In other words, you need a one-token lookahead. A more conventional parser would use a scanner that lets you peek at the next token without absorbing it, but that's not an option when you're receiving the tokens via yield, so another solution must be found. The solution I chose was to keep the lookahead token as state in the parsing functions, and pass it to wherever it's needed. Your parse_elem() function clearly needs it, so it should take it as a parameter. If there's some circumstance in which you know for certain that there's an elem coming up, you can always write another parsing function for dealing with that, e.g. def expect_elem(): first = yield return yield from parse_elem(opening_tag = first) I don't think there's anything inconvenient about that.
A convention like Nick suggested where all coroutines take an optional "start" argument with the first value to yield doesn't help, because it is not the value to yield that is the problem.
I think you've confused the issue a bit yourself, because you started out by asking for a way of specifing the first value to yield in the yield-from expression. But it seems that what you really want is to specify the first value to *send* into the subiterator. I haven't seen anything so far that convinces me it would be a serious inconvenience not to have such a feature. Also, it doesn't seem to generalize. What if your parser needs a two-token lookahead? Then you'll be asking for a way to specify the first *two* values to send in. Where does it end? -- Greg
Greg Ewing wrote:
Jacob Holm wrote:
I would like to split off a function for parsing a single element. And I would like it to look like this:
def parse_elem(): opening_tag = yield name = opening_tag[1:-1] items = yield from parse_items("</%s>" % name) return (name, items)
I don't see what you gain by writing it like that, though.
A more consistent api for calling it. Instead of special-casing the first input, all input is provided the same way. That makes it more similar to parse_items that is already called that way.
You don't even know whether you want to call this function until you've seen the first token and realized that it's a tag.
Not when used from parse_items, but remember that my intension was to make parse_elem independently useful. If you are just starting to parse a presumed valid xml stream (without self-closed tags), you know that it consists of a single element but won't know that it is.
In other words, you need a one-token lookahead. A more conventional parser would use a scanner that lets you peek at the next token without absorbing it, but that's not an option when you're receiving the tokens via yield, so another solution must be found.
The solution I chose was to keep the lookahead token as state in the parsing functions, and pass it to wherever it's needed. Your parse_elem() function clearly needs it, so it should take it as a parameter.
If there's some circumstance in which you know for certain that there's an elem coming up, you can always write another parsing function for dealing with that, e.g.
def expect_elem(): first = yield return yield from parse_elem(opening_tag = first)
I don't think there's anything inconvenient about that.
Except you now have one extra function for exactly the same task, just with a different calling convention. And this doesn't handle an initial throw() correctly. Not that I see any reason to use throw in the parser example. I'm just saying an extra function wouldn't work in that case.
A convention like Nick suggested where all coroutines take an optional "start" argument with the first value to yield doesn't help, because it is not the value to yield that is the problem.
I think you've confused the issue a bit yourself, because you started out by asking for a way of specifing the first value to yield in the yield-from expression. But it seems that what you really want is to specify the first value to *send* into the subiterator.
In this case, yes. In other cases it really is the first value to yield from the subiterator, or the first value to throw into the subiterator. At least part of the confusion comes from the fact that if yield-from could somehow suppress the initial next and yield a different value instead (either an extra expression in yield-from or the last value yielded by a primed generator), there would be a simple way to write wrappers that could be used at the call site to handle all those cases. So a feature that allowed specifying the first value to yield in the yield-from expression *would* be enough, but a start argument to the coroutine constructor isn't.
I haven't seen anything so far that convinces me it would be a serious inconvenience not to have such a feature.
Also, it doesn't seem to generalize. What if your parser needs a two-token lookahead? Then you'll be asking for a way to specify the first *two* values to send in. Where does it end?
The "suppress initial next()" feature *would* have helped, by enabling you to write a generic wrapper to use at the call site that could do exactly that. The wrapper could use send() as many times as needed on the wrapped generator, then use yield-from to call it when done. Without that feature, the wrapper can't use yield-from to call the wrapped generator. Of course there are (slower) ways to write such a wrapper without using yield-from. The alternative to using a call wrapper is to rewrite the subiterator to take the full lookahead as arguments, but how would you write functions like parse_elem and parse_items if the lookahead is variable? (You can safely assume that the lookahead is no more than is needed to exhaust the generator) I think I can probably generalize the cr_init() pattern to handle a variable lookahead, but I think even a slow call wrapper might be faster in that case (depending on the nesting level). - Jacob
Jacob Holm wrote:
At least part of the confusion comes from the fact that if yield-from could somehow suppress the initial next and yield a different value instead (either an extra expression in yield-from or the last value yielded by a primed generator), there would be a simple way to write wrappers that could be used at the call site to handle all those cases. So a feature that allowed specifying the first value to yield in the yield-from expression *would* be enough, but a start argument to the coroutine constructor isn't.
I think leaving this task to wrapper classes in the initial version of the PEP is the right way to go at this point. Adding a "skip the initial next and yield <expr> instead" clause later will be much easier than trying to undo something added now if it turns out to be a mistake. Greg's basic proposal makes the easy things easy and the difficult things possible, so it is a very good place to start. The main change I would like from the original version of the PEP is for caching the bound methods to be explicitly disallowed in order to match the behaviour of normal for loops. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Jacob Holm wrote:
At least part of the confusion comes from the fact that if yield-from could somehow suppress the initial next and yield a different value instead (either an extra expression in yield-from or the last value yielded by a primed generator), there would be a simple way to write wrappers that could be used at the call site to handle all those cases. So a feature that allowed specifying the first value to yield in the yield-from expression *would* be enough, but a start argument to the coroutine constructor isn't.
I think leaving this task to wrapper classes in the initial version of the PEP is the right way to go at this point.
I have already given up on getting this feature in at this point. The above paragraph was just meant to clear up some misunderstandings.
Adding a "skip the initial next and yield <expr> instead" clause later will be much easier than trying to undo something added now if it turns out to be a mistake.
Note that if we decide that it is OK to use yield-from with an already-started generator in this version, we can't later change yield-from to use the latest value yielded in place of the initial next(). That makes new syntax the only possibility for that future extension. Not that this is necessarily a bad thing.
Greg's basic proposal makes the easy things easy and the difficult things possible, so it is a very good place to start.
Yes. You can even write the slow version of the call-wrappers I am talking about, and then replace them with the faster versions later if the feature becomes available.
The main change I would like from the original version of the PEP is for caching the bound methods to be explicitly disallowed in order to match the behaviour of normal for loops.
I am not really sure about this. It looks very much like an implementation detail to me. On the other hand, the ability to replace the methods mid-flight might give us a way to implement the call-wrappers with minimal overhead. Since the current patches don't actually do any caching, this is something I should actually be able to try. - Jacob
Jacob Holm wrote:
I am not really sure about this. It looks very much like an implementation detail to me. On the other hand, the ability to replace the methods mid-flight might give us a way to implement the call-wrappers with minimal overhead. Since the current patches don't actually do any caching, this is something I should actually be able to try.
The part that makes me nervous is the fact that the PEP as it stands gives the green light to an implementation having different bound method caching behaviour between for loops and the yield-from expression. That goes against Guido's request that the degenerate case of yield-from have the same semantics as: for x in subiter: yield x Since the language reference is actually silent on the topic of caching the bound method when iterating over an object, I would phrase it along the following lines: - if for loops in a Python implementation cache next(), then yield-from in that implementation should also cache next() - if yield-from caches next(), it should also cache sent() and throw() - Since CPython for loops don't cache the bound method for next(), it won't cache the methods used by yield-from either Who knows, maybe Guido will actually clarify the matter for us when he gets back from his vacation :) Cheers, Nick. P.S. Speaking of vacations, I'll also be offline for the next week or so (starting tomorrow), and then my internet access for Python activities will be sketchy for another couple of weeks as I move house. So I won't be able to contribute much more to this discussion. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sat, Apr 11, 2009 at 5:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Jacob Holm wrote:
I am not really sure about this. It looks very much like an implementation detail to me. On the other hand, the ability to replace the methods mid-flight might give us a way to implement the call-wrappers with minimal overhead. Since the current patches don't actually do any caching, this is something I should actually be able to try.
The part that makes me nervous is the fact that the PEP as it stands gives the green light to an implementation having different bound method caching behaviour between for loops and the yield-from expression.
That goes against Guido's request that the degenerate case of yield-from have the same semantics as:
for x in subiter: yield x
Since the language reference is actually silent on the topic of caching the bound method when iterating over an object, I would phrase it along the following lines:
- if for loops in a Python implementation cache next(), then yield-from in that implementation should also cache next() - if yield-from caches next(), it should also cache sent() and throw() - Since CPython for loops don't cache the bound method for next(), it won't cache the methods used by yield-from either
In ceval.c, the FOR_ITER opcode expects the iterator on top of the stack and calls (v->ob_type->tp_iternext)(v). You tell me whether that is caching or not. :-)
Who knows, maybe Guido will actually clarify the matter for us when he gets back from his vacation :)
Or sooner. :-)
P.S. Speaking of vacations, I'll also be offline for the next week or so (starting tomorrow), and then my internet access for Python activities will be sketchy for another couple of weeks as I move house. So I won't be able to contribute much more to this discussion.
Good, let this be a general trend. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Nick Coghlan wrote:
Since the language reference is actually silent on the topic of caching the bound method when iterating over an object,
Since it's silent about that, if you write a for-loop that relies on presence or absence of cacheing behaviour, the result is undefined. The behaviour of yield-from on the same iterator would also be undefined. It's meaningless to talk about whether one undefined construct has the same semantics as another. -- Greg
On Sat, Apr 11, 2009 at 4:44 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Nick Coghlan wrote:
Since the language reference is actually silent on the topic of caching the bound method when iterating over an object,
Since it's silent about that, if you write a for-loop that relies on presence or absence of cacheing behaviour, the result is undefined. The behaviour of yield-from on the same iterator would also be undefined.
It's meaningless to talk about whether one undefined construct has the same semantics as another.
But I wouldn't claim that the language reference being silent means that it's undefined. If I were asked for a clarification I would say that caching shouldn't be allowed if it changes the meaning of the program. Python in general favors *defined* semantics over leaving things in the gray. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Greg Ewing wrote:
Nick Coghlan wrote:
Since the language reference is actually silent on the topic of caching the bound method when iterating over an object,
Since it's silent about that, if you write a for-loop that relies on presence or absence of cacheing behaviour, the result is undefined. The behaviour of yield-from on the same iterator would also be undefined.
It's meaningless to talk about whether one undefined construct has the same semantics as another.
I agree that would be true in the absence of an accepted reference implementation (i.e. CPython) that doesn't cache the bound methods (hence allowing one to play games with the next() method definition while looping over an iterator). If I understand Guido's last message correctly, this is one of the cases where he would like the existing behaviour of the CPython implementation to be the defined behaviour for the language as well. Cheers, Nick. P.S. I created http://bugs.python.org/issue5739 as a documentation bug pointing back to this email thread in relation to whether it is OK for a Python implementation to cache the next() method lookup in a for loop. P.P.S. OK, stepping away from the computer and going on vacation now... :) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
The main change I would like from the original version of the PEP is for caching the bound methods to be explicitly disallowed in order to match the behaviour of normal for loops.
But if I let the expansion serve as a literal specification, it won't match the behaviour of for-loops either, because although it doesn't cache methods, PyIter_Next isn't exactly the same as looking up next() on the instance either. I definitely don't want to preclude the implementation from using PyIter_Next, as that would be a major performance hit in the most common case. I also don't want to preclude caching a send() method, because in the absence of a __send__ typeslot it's the only way we have of improving performance. I don't care much about throw() or close(), because they will rarely be called anyway. But by the same token, little would be gained by a wrapper using fancy tricks to redirect them. -- Greg
Greg Ewing wrote:
I definitely don't want to preclude the implementation from using PyIter_Next, as that would be a major performance hit in the most common case.
We already have a general caveat in the docs saying that an implementation is allowed (or sometimes even required) to bypass normal attribute lookup for special methods defined by the language. You may want to point to that caveat from the PEP: http://docs.python.org/reference/datamodel.html#special-method-lookup-for-ne... http://docs.python.org/3.0/reference/datamodel.html#special-method-lookup Being able to use PyIter_Next and the various other typeslots is exactly what that caveat is about. One way you can make the expansion more explicit about bypassing the instance is to write things like "type(itr).send(itr, val)" instead of "itr.send(val)".
I also don't want to preclude caching a send() method, because in the absence of a __send__ typeslot it's the only way we have of improving performance.
Actually, we do have another way of improving performance - add a typeslot for it :) That can be left until we find out whether or not the lookup of send() becomes a performance bottleneck for yield-from usage (which I doubt will be the case). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
We already have a general caveat in the docs saying that an implementation is allowed (or sometimes even required) to bypass normal attribute lookup for special methods defined by the language.
However, in 2.x it's not obvious that next() is a special method, because it doesn't have an __xxx__ name. I think what I'll do is this: * Use Python 3 syntax for the expansion, and write next(_i) instead of _i.next(). * Not say anything one way or the other about cacheing methods. -- Greg
Jacob Holm wrote:
Except you now have one extra function for exactly the same task, just with a different calling convention.
I don't see anything wrong with that. If you look in the stdlib, there are plenty of places where alternative APIs are provided for the same functionality, e.g. in the re module you have the module-level functions as well as the match object methods. I would rather have a couple of functions written in a straightforward way than rely on a magic wrapper to artificially munge them into one. Transparent is better than opaque.
The "suppress initial next()" feature *would* have helped, by enabling you to write a generic wrapper to use at the call site that could do exactly that.
Now you're just moving the wrappers from one place to another. I can write a wrapper to convert any lookahead taking parsing function into a non-lookahead one: def expect(f): first = yield return yield from f(first) So at the cost of just one extra function, I can call any of my parsing functions using either style. -- Greg
participants (4)
-
Greg Ewing
-
Guido van Rossum
-
Jacob Holm
-
Nick Coghlan