[Python-ideas] Revised**12 PEP on Yield-From

Thu Apr 23 13:35:06 CEST 2009

2009/4/23 Greg Ewing <greg.ewing at canterbury.ac.nz>:
> Erik Groeneveld wrote:
>
>> the readRe generator must be able to
>> indicate that it has superfluous data, and this data must be processed
>> by other generators.
>>
>> Have you though about this? =C2=A0How would you solve it?
>
> I think you're expecting a bit much from yield-from.

Well, you asked for practical applications, and here is one.  I hope
to be able to use yield-from in Weightless instead of its compose
function.  However, I do not see how a yield-from without support for
splitting boundaries would be combined with my own code to do the
latter.  If this combination is not possible, I would be forced to
still use compose instead of yield-from.  I would regret that mostly.

So I am expecting at least a yield-from that can be combined
orthogonally with my boundary splitting code (and other things, see
below).  At present, it can't because there is no way to detect of
intercept an yield-from.

> In a situation like this, I wouldn't use yield to
> receive the values. I'd read them from some kind of
> buffering object that allows peeking ahead however
> far is needed.

Well, the whole point of using coroutines is to avoid buffering.  I'll
try to elaborate on this point a bit, and I hope I can convince you
and others to investigate what the consequences of this type of
applications could be for the usage or implementation of yield-from.

When generalized generators were introduced many people immediately
saw the advantage for using them for thread-less I/O: tie a generator
to a socket.  I took the challenge and found it to be extraordinary
complicated.  Back to that later, first a little background.

I started with Michael Jacksons now more than 30 years old JSP theory
about structuring programs based on the input and output stream they
process. All based on coroutines.  His assumptions about memory and
storage latency of mainframes are valid today for web-servers.  The
idea basically boils down to decompose a data-processing program into
coroutines, as easily as you are used to do with functions.  A
programmer would be able to 'call' subcoroutines as if they were
functions, without need for diving into subtle and hard to understand
differences or inconsistencies between the two.

It took me two years to get it right.  Every time I switched to role
of 'a programmer', I got stuck with code not working as expected,
incomprehensible stack-traces etc.  Others were even more puzzled. It
was not transparent in its usage and I had to go back to the working
bench.

But what a reward when it finally worked!  I have never seen such
simple easy to read code for for example an HTTP server.  Notoriously
difficult bugs in my call-back based HTTP server I was not able to
solved just vanished.  I still am impressed by the cleanness of the
code and I keep wondering: 'can it really be that simple'?.  Was this
really conceived more than 30 years ago?  Jackson must have been a
genius!

Since then I have presented this on the British SPA conference and two
Dutch Pythonist groups.  I assembled a callback-vs-coroutine test case
which clearly demonstrates the differences in amount of code,
readability of code and locality of change when adding features.
People explicitly appreciated the intuitive behavior for a programmer.
(all documented at http://weightless.io and code fragments in svn)

Back to why it was so complicated.

First of all, as you already know, it is not possible to use just a
straightforward for-loop to delegate to another coroutine. The
yield-from proposal covers this all I believe.

Secondly, if something goes wrong and a stack-trace is printed, this
stack-trace would not reflect the proper sequence in which coroutines
were called (this really make a programmer go mad!), at least not
without additional efforts to maintain an explicit callstack with each
generator on it, and using this to adjust the stack-trace when needed.
(This is why I asked if the coroutine will be on the call-stack and
hence be visible in a stack-trace).

Thirdly, there seems to be some sort of unspoken 'protocol' with
generators.  A next() is actually send(None) and vaguely means 'I want
data'.  It the same vein 'x =3D yield' actually is 'x =3D yield None' and
also vaguely means 'I want data'.  So the None seems to play a special
role.  I hesitated a lot, but I had to apply this 'protocol' to
couroutines, otherwise It was next to impossible to work with them as
being 'the programmer'; it requires constant checking what happened.
Funny enough, it turned out to be a major break-through in getting it
transparent to a programmer.

Fourthly, there is the issue of boundary clashes.  These are common in
any data-processing problem.  The input data is simply not structured
or tokenized according to boundaries on a certain level of
abstraction.  This is the *very reason* to use coroutines and Jackson
describes elegant ways to solve the problem.  JSP requires a lookahead
and the coroutines must have some way to support this.  (Introducing a
stream or buffer would put us back to where we started of course).
After several tries I settled for a push-back mechanism as this was
the most elegant way (IMHO) to solve it.  (This is why I suggested
'return value, pushbackdata').

At this point I hope I have gained you interest for this kind of
data-processing applications and I hope that we can have a fruitful
discussion about it.

Also, I would like to here what other kind of applications you have in mind.

Best regards
Erik