Greg Ewing wrote:
Jacob Holm wrote:
I would like to split off a function for parsing a single element. And I would like it to look like this:
def parse_elem(): opening_tag = yield name = opening_tag[1:-1] items = yield from parse_items("</%s>" % name) return (name, items)
I don't see what you gain by writing it like that, though.
A more consistent api for calling it. Instead of special-casing the first input, all input is provided the same way. That makes it more similar to parse_items that is already called that way.
You don't even know whether you want to call this function until you've seen the first token and realized that it's a tag.
Not when used from parse_items, but remember that my intension was to make parse_elem independently useful. If you are just starting to parse a presumed valid xml stream (without self-closed tags), you know that it consists of a single element but won't know that it is.
In other words, you need a one-token lookahead. A more conventional parser would use a scanner that lets you peek at the next token without absorbing it, but that's not an option when you're receiving the tokens via yield, so another solution must be found.
The solution I chose was to keep the lookahead token as state in the parsing functions, and pass it to wherever it's needed. Your parse_elem() function clearly needs it, so it should take it as a parameter.
If there's some circumstance in which you know for certain that there's an elem coming up, you can always write another parsing function for dealing with that, e.g.
def expect_elem(): first = yield return yield from parse_elem(opening_tag = first)
I don't think there's anything inconvenient about that.
Except you now have one extra function for exactly the same task, just with a different calling convention. And this doesn't handle an initial throw() correctly. Not that I see any reason to use throw in the parser example. I'm just saying an extra function wouldn't work in that case.
A convention like Nick suggested where all coroutines take an optional "start" argument with the first value to yield doesn't help, because it is not the value to yield that is the problem.
I think you've confused the issue a bit yourself, because you started out by asking for a way of specifing the first value to yield in the yield-from expression. But it seems that what you really want is to specify the first value to *send* into the subiterator.
In this case, yes. In other cases it really is the first value to yield from the subiterator, or the first value to throw into the subiterator. At least part of the confusion comes from the fact that if yield-from could somehow suppress the initial next and yield a different value instead (either an extra expression in yield-from or the last value yielded by a primed generator), there would be a simple way to write wrappers that could be used at the call site to handle all those cases. So a feature that allowed specifying the first value to yield in the yield-from expression *would* be enough, but a start argument to the coroutine constructor isn't.
I haven't seen anything so far that convinces me it would be a serious inconvenience not to have such a feature.
Also, it doesn't seem to generalize. What if your parser needs a two-token lookahead? Then you'll be asking for a way to specify the first *two* values to send in. Where does it end?
The "suppress initial next()" feature *would* have helped, by enabling you to write a generic wrapper to use at the call site that could do exactly that. The wrapper could use send() as many times as needed on the wrapped generator, then use yield-from to call it when done. Without that feature, the wrapper can't use yield-from to call the wrapped generator. Of course there are (slower) ways to write such a wrapper without using yield-from. The alternative to using a call wrapper is to rewrite the subiterator to take the full lookahead as arguments, but how would you write functions like parse_elem and parse_items if the lookahead is variable? (You can safely assume that the lookahead is no more than is needed to exhaust the generator) I think I can probably generalize the cr_init() pattern to handle a variable lookahead, but I think even a slow call wrapper might be faster in that case (depending on the nesting level). - Jacob