[Python-ideas] x=(yield from) confusion [was:Yet another alternative name for yield-from]
Jacob Holm
jh at improva.dk
Sat Apr 11 12:40:34 CEST 2009
Greg Ewing wrote:
> Jacob Holm wrote:
>
>> I would like to split off a function for parsing a single element.
>> And I would like it to look like this:
>>
>> def parse_elem():
>> opening_tag = yield
>> name = opening_tag[1:-1]
>> items = yield from parse_items("</%s>" % name)
>> return (name, items)
>
> I don't see what you gain by writing it like that, though.
A more consistent api for calling it. Instead of special-casing the
first input, all input is provided the same way. That makes it more
similar to parse_items that is already called that way.
> You don't even know whether you want to call this function
> until you've seen the first token and realized that it's
> a tag.
Not when used from parse_items, but remember that my intension was to
make parse_elem independently useful. If you are just starting to parse
a presumed valid xml stream (without self-closed tags), you know that it
consists of a single element but won't know that it is.
>
> In other words, you need a one-token lookahead. A more
> conventional parser would use a scanner that lets you
> peek at the next token without absorbing it, but that's
> not an option when you're receiving the tokens via
> yield, so another solution must be found.
>
> The solution I chose was to keep the lookahead token
> as state in the parsing functions, and pass it to
> wherever it's needed. Your parse_elem() function clearly
> needs it, so it should take it as a parameter.
>
> If there's some circumstance in which you know for
> certain that there's an elem coming up, you can always
> write another parsing function for dealing with that,
> e.g.
>
> def expect_elem():
> first = yield
> return yield from parse_elem(opening_tag = first)
>
> I don't think there's anything inconvenient about that.
>
Except you now have one extra function for exactly the same task, just
with a different calling convention. And this doesn't handle an initial
throw() correctly. Not that I see any reason to use throw in the parser
example. I'm just saying an extra function wouldn't work in that case.
>> A convention like Nick suggested where all coroutines take an
> > optional "start" argument with the first value to yield doesn't
> > help, because it is not the value to yield that is the problem.
>
> I think you've confused the issue a bit yourself, because
> you started out by asking for a way of specifing the first
> value to yield in the yield-from expression. But it seems
> that what you really want is to specify the first value
> to *send* into the subiterator.
In this case, yes. In other cases it really is the first value to yield
from the subiterator, or the first value to throw into the subiterator.
At least part of the confusion comes from the fact that if yield-from
could somehow suppress the initial next and yield a different value
instead (either an extra expression in yield-from or the last value
yielded by a primed generator), there would be a simple way to write
wrappers that could be used at the call site to handle all those cases.
So a feature that allowed specifying the first value to yield in the
yield-from expression *would* be enough, but a start argument to the
coroutine constructor isn't.
>
> I haven't seen anything so far that convinces me it would
> be a serious inconvenience not to have such a feature.
>
> Also, it doesn't seem to generalize. What if your parser
> needs a two-token lookahead? Then you'll be asking for a
> way to specify the first *two* values to send in. Where
> does it end?
>
The "suppress initial next()" feature *would* have helped, by enabling
you to write a generic wrapper to use at the call site that could do
exactly that. The wrapper could use send() as many times as needed on
the wrapped generator, then use yield-from to call it when done.
Without that feature, the wrapper can't use yield-from to call the
wrapped generator. Of course there are (slower) ways to write such a
wrapper without using yield-from.
The alternative to using a call wrapper is to rewrite the subiterator to
take the full lookahead as arguments, but how would you write functions
like parse_elem and parse_items if the lookahead is variable? (You can
safely assume that the lookahead is no more than is needed to exhaust
the generator)
I think I can probably generalize the cr_init() pattern to handle a
variable lookahead, but I think even a slow call wrapper might be faster
in that case (depending on the nesting level).
- Jacob
More information about the Python-ideas
mailing list