[Python-ideas] x=(yield from) confusion [was:Yet another alternative name for yield-from]

Sat Apr 11 12:40:34 CEST 2009

Greg Ewing wrote:
> Jacob Holm wrote:
>
>> I would like to split off a function for parsing a single element.  
>> And I would like it to look like this:
>>
>> def parse_elem():
>>    opening_tag = yield
>>    name = opening_tag[1:-1]
>>    items = yield from parse_items("</%s>" % name)
>>    return (name, items)
>
> I don't see what you gain by writing it like that, though.

A more consistent api for calling it.  Instead of special-casing the 
first input, all input is provided the same way.  That makes it more 
similar to parse_items that is already called that way.

> You don't even know whether you want to call this function
> until you've seen the first token and realized that it's
> a tag.

Not when used from parse_items, but remember that my intension was to 
make parse_elem independently useful.  If you are just starting to parse 
a presumed valid xml stream (without self-closed tags), you know that it 
consists of a single element but won't know that it is.

>
> In other words, you need a one-token lookahead. A more
> conventional parser would use a scanner that lets you
> peek at the next token without absorbing it, but that's
> not an option when you're receiving the tokens via
> yield, so another solution must be found.
>
> The solution I chose was to keep the lookahead token
> as state in the parsing functions, and pass it to
> wherever it's needed. Your parse_elem() function clearly
> needs it, so it should take it as a parameter.
>
> If there's some circumstance in which you know for
> certain that there's an elem coming up, you can always
> write another parsing function for dealing with that,
> e.g.
>
>   def expect_elem():
>     first = yield
>     return yield from parse_elem(opening_tag = first)
>
> I don't think there's anything inconvenient about that.
>

Except you now have one extra function for exactly the same task, just 
with a different calling convention.  And this doesn't handle an initial 
throw() correctly.  Not that I see any reason to use throw in the parser 
example.  I'm just saying an extra function wouldn't work in that case.

>> A convention like Nick suggested where all coroutines take an
> > optional "start" argument with the first value to yield doesn't
> > help, because it is not the value to yield that is the problem.
>
> I think you've confused the issue a bit yourself, because
> you started out by asking for a way of specifing the first
> value to yield in the yield-from expression. But it seems
> that what you really want is to specify the first value
> to *send* into the subiterator.

In this case, yes.  In other cases it really is the first value to yield 
from the subiterator, or the first value to throw into the subiterator.

At least part of the confusion comes from the fact that if yield-from 
could somehow suppress the initial next and yield a different value 
instead (either an extra expression in yield-from or the last value 
yielded by a primed generator), there would be a simple way to write 
wrappers that could be used at the call site to handle all those cases.  
So a feature that allowed specifying the first value to yield in the 
yield-from expression *would* be enough, but a start argument to the 
coroutine constructor isn't.

>
> I haven't seen anything so far that convinces me it would
> be a serious inconvenience not to have such a feature.
>
> Also, it doesn't seem to generalize. What if your parser
> needs a two-token lookahead? Then you'll be asking for a
> way to specify the first *two* values to send in. Where
> does it end?
>

The "suppress initial next()" feature *would* have helped, by enabling 
you to write a generic wrapper to use at the call site that could do 
exactly that.  The wrapper could use send() as many times as needed on 
the wrapped generator, then use yield-from to call it when done.  
Without that feature, the wrapper can't use yield-from to call the 
wrapped generator.   Of course there are (slower) ways to write such a 
wrapper without using yield-from.

The alternative to using a call wrapper is to rewrite the subiterator to 
take the full lookahead as arguments, but how would you write functions 
like parse_elem and parse_items if the lookahead is variable?  (You can 
safely assume that the lookahead is no more than is needed to exhaust 
the generator)

I think I can probably generalize the cr_init() pattern to handle a 
variable lookahead, but I think even a slow call wrapper might be faster 
in that case (depending on the nesting level).

- Jacob