[Python-ideas] PEP on yield-from: throw example
Bruce Frederiksen
dangyogi at gmail.com
Thu Feb 19 19:17:52 CET 2009
Greg Ewing wrote:
> Bruce Frederiksen wrote:
>
>> 1. The double use of send/throw and the yield expression for
>> simultaneous input and output to/from the generator; rather than
>> separating input and output as two different constructs. Sending
>> one value in does not always correspond to getting one value out.
>
> You might not be interested in sending or receiving
> a value every time, but you do have to suspend the
> generator each time you want to send and/or receive
> a value.
>
> Currently, there is only one way to suspend a
> generator, which for historical reasons is called
> 'yield'. Each time you use it, you have the opportunity
> to send a value, and an opportunity to receive a
> value, but you don't have to use both of these (or
> either of them) if you don't want to.
>
> What you seem to be proposing is having two aliases
> for 'yield', one of which only sends and the other
> only receives. Is that right? If so, I don't see
> much point in it other than making code read
> slightly better.
I'm thinking the yield goes away (both the statement and expression
form). This would be replaced by builtin functions. I would propose
that the builtins take optional pipe arguments that would default to the
current thread's pipein/pipeout. I would also propose that each thread
be allowed multiple input and/or output pipes and that the selection of
which to use could be done by passing an integer value for the pipe
argument. For example:
send(obj, pipeout = None)
send_from(iterable, pipeout = None) # does what "yield from" is
supposed to do
next(iterator = None)
num_input_pipes()
num_output_pipes()
You may need a few more functions to round this out:
pipein(index = 0) # returns the current thread's pipein[index] object,
could also use iter() for this.
pipeout(index = 0) # returns the current thread's pipeout[index] object
throwforward(exc_type, exc_value = None, traceback = None, pipeout = None)
throwback(exc_type, exc_value = None, traceback = None, pipein = None)
Thus:
yield expr
becomes
send(expr)
which doesn't mean "this is generator" or that control will
*necessarily* be transfered to another thread here. It depends on
whether the other thread has already done a next on the corresponding
pipein.
I'm thinking that the C code (byte interpretor) that manages Python
stack frame objects become detached from Python stack, so that a Python
to Python call does not grow the C stack. This would allow the C code
to fork the Python stack and switch between branches quite easily.
This separation of input and output would clean up most generator examples.
Guido's tree flattener has special code to yield SKIP in response to
SKIP, because he doesn't really want a value returned from sending a
SKIP in. This would no longer be necessary.
def __iter__(self):
skip = yield self.label
if skip == SKIP:
yield SKIPPED
else:
skip = yield ENTER
if skip == SKIP:
yield SKIPPED
else:
for child in self.children:
yield from child
yield LEAVE
# I guess a SKIP can't be returned here?
becomes:
def __iter__(self):
return generate(self.flatten)
def flatten(self):
send(self.label)
if next() != SKIP:
send(ENTER)
if next() != SKIP:
for child in self.children:
child.flatten()
send(LEAVE)
Also, the caller could then simply look like:
for token in tree():
if too_deep:
send(SKIP)
else:
send(None)
<process token>
rather than:
response = None
gen = tree()
try:
while True:
token = gen.send(response)
if too_deep:
response = SKIP
else:
response = None
<process token>
except StopIteration:
pass
The reason for this extra complexity is that send returns a value.
Separating send from yielding values lets you call send from within for
statements without having another value land in your lap that you really
would rather have sent to the for statement.
The same thing applies to throw. If throw didn't return a value, then
it could be easily called within for statements.
The parsing example goes from:
def scanner(text):
for m in pat.finditer(text):
token = m.group(0)
print "Feeding:", repr(token)
yield token
yield None # to signal EOF
def parse_items(closing_tag = None):
elems = []
while 1:
token = token_stream.next()
if not token:
break # EOF
if is_opening_tag(token):
elems.append(parse_elem(token))
elif token == closing_tag:
break
else:
elems.append(token)
return elems
def parse_elem(opening_tag):
name = opening_tag[1:-1]
closing_tag = "</%s>" % name
items = parse_items(closing_tag)
return (name, items)
to
def scanner(text):
for m in pat.finditer(text):
token = m.group(0)
print "Feeding:", repr(token)
send(token)
def parse_items(closing_tag = None):
for token in next():
if is_opening_tag(token):
send(parse_elem(token))
elif token == closing_tag:
break
else:
send(token)
def parse_elem(opening_tag):
name = opening_tag[1:-1]
closing_tag = "</%s>" % name
items = list(generate(parse_items(closing_tag), pipein=pipein()))
return (name, items)
and perhaps called as:
tree = list(scanner(text) | parse_items())
This also obviates the need to do an initial next call when pushing
(sending) to generators which are acting as consumers. A need which is
difficult to explain and to understand.
>
>> * I'm thinking here of a pair of cooperating pipe objects,
>> read and write,
>
> Pipes are different in an important way -- they
> have queueing. Writes to one end don't have to
> interleave perfectly with reads at the other.
> But generators aren't like that -- there is no
> buffer to hold sent/yielded values until the
> other end is ready for them.
>
> Or are you suggesting that there should be such
> buffering? I would say that's a higher-level facility
> that should be provided by library code using
> yield, or something like it, as a primitive.
I didn't mean to imply that buffering was required, or even desired.
With no buffering, the sender and receiver stay in-sync, just like
generators. A write would suspend until a matching read, and vice
versa. Only when the pipe sees both a write and a read would the object
be transfered from the writer to the reader. Thus, write/read replaces
yield as the way to suspend the current "thread".
This avoids the confusion about whether we're "pushing" or "pulling"
to/from a generator.
For example, itertools.tee is currently designed as a generator that
"pulls" values from its iterable parameter. But then it can't switch
roles to "push" values to its consumers, and so must be prepared to
store values in case the consumers aren't synchronized with each other.
With this new approach, the consumer waiting for the send value would be
activated by the pipe connecting it to tee. And if that consumer wasn't
ready for a value yet, tee would be suspended until it was. So tee
would not have to store any values.
def tee():
num_outputs = num_output_pipes()
for input in next():
for i in range(num_outputs):
send(input, i)
Does this help?
-bruce frederiksen
More information about the Python-ideas
mailing list