[Python-Dev] Twisted Isn't Specific (was Re: Trial balloon: microthreads library in stdlib)

Jean-Paul Calderone exarkun at divmod.com
Thu Feb 15 15:19:30 CET 2007


On Thu, 15 Feb 2007 02:36:22 -0700, Andrew Dalke <dalke at dalkescientific.com> wrote:
>I was the one on the Stackless list who last September or so
>proposed the idea of monkeypatching and I'm including that
>idea in my presentation for PyCon.  See my early rough draft
>at http://www.stackless.com/pipermail/stackless/2007-February/002212.html
>which contains many details about using Stackless, though
>none on the Stackless implementation. (A lot on how to tie things together.)
>
>So people know, I am an applications programmer and not a
>systems programmer.  Things like OS-specific event mechanisms
>annoy and frustrate me.  If I could do away with hardware and
>still write useful programs I would.

What a wonderful world it would be. :)

>
> [snip]
>
>In all three cases I've found it hard to use Twisted because
>the code didn't do as I expected it to do and when something
>went wrong I got results which were hard to interpret.  I
>believe others have similar problems and is one reason Twisted
>is considered to be "a big, complicated, inseparable hairy mess."
>
>I find the Stackless code also hard to understand.  Eg,
>I don't know where the watchdog code is for the "run()"
>command.  It uses several layers of macros and I haven't
>been able get it straight in my head.  However, so far
>I've not run into strange errors in Stackless that I
>have in Twisted.
>

As you point out below, however, Twisted and stackless achieve different goals.

>I find the normal Python code relatively easy to understand.
>
>Stackless only provides threadlets.  It does no I/O.
>Richard Tew developed a "stacklesssocket" module which emulates
>the API for the stdlib "socket" module.  I tweaked it a
>bit and showed that by doing the monkeypatch
>
>  import stacklesssocket
>  import sys
>  sys.modules["socket"] = stacklesssocket
>
>then code like "urllib.urlopen" became Stackless compatible.
>Eg, in my PyCon talk draft I show something like
>

It may be of interest to you to learn that a Twisted developer implement this
model several years ago.  It has not been further developed for a handful of
reasons, at the core of which is the fact that it is very similar to
pre-emptive threading in terms of application-level complexity.

You gave several examples of the use of existing code which expects a blocking
socket interface and which "just works" when the socket module is changed in
this way.

However, this is a slight simplification.  Code written without expecting a
context switch (exactly what happens when a socket operation is performed in
this model) is not necessarily correct when context switches are suddenly
introduced.

Consider this extremely trivial example:

  x = 0
  def foo(conn):
      global x
      a = x + 1
      b = ord(conn.recv(1))
      x = a + b
      return x

Clearly, foo is not threadsafe.  Global mutable state is a terrible, terrible
thing.  The point to note is that by introducing a context switch at the
conn.recv(1) call, the same effect is achieved as by any other context switch:
it becomes possible for foo to return an inconsistent result or otherwise
corrupt its own state if another piece of code violates its assumptions and
changes x while it is waiting for the recv call to complete.

Is urllib2 threadsafe?  I have heard complaints that it is not.  I have looked
at the code, and at least in its support for caching, it appears not to be.
Perhaps it can be made threadsafe, but in requiring that, the advantage of
having a whole suite of modules which will "just work" with a transparently
context switching socket module are mostly lost.

>
> [snip - urllib2/tasklet example]
>
>The choice of asyncore is, I think, done because 1) it
>prevents needing an external dependency,

But if some new event loop is introduced into the standard library, then using
it also will not require an external dependency. ;)

>2) asyncore is
>smaller and easier to understand than Twisted,

While I hear this a lot, applications written with Twisted _are_ shorter and
contain less irrelevant noise in the form of boilerplate than the equivalent
asyncore programs.  This may not mean that Twisted programs are easier to
understand, but it is at least an objectively measurable metric.

>and
>3) it was for demo/proof of concept purposes.
>While
>tempting to improve that module I know that Twisted
>has already gone though all the platform-specific crap
>and I don't want to go through it again myself.  I don't
>want to write a reactor to deal with GTK, and one for
>OS X, and one for ...
>

Now if we can only figure out a way for everyone to benefit from this without
tying too many brains up in knots. :)

>
>Another reason I think Twisted is considered "tangled-up
>Deep Magic, only for Wizards Of The Highest Order" is because
>it's infused with event-based processing.  I've done a lot
>of SAX processing and I can say that few people think that
>way or want to go through the process of learning how.
>
>Compare, for example, the following
>
>  f = urllib2.urlopen("http://example.com/")
>  for i, line in enumerate(f):
>    print ("%06d" % i), repr(line)
>
>with the normal equivalent in Twisted or other
>async-based system.
>

Several years ago, Christopher Armstrong (hopefully he won't get too upset at
me for mentioning him here) write a Twisted/ Stackless integration library.
When greenlets came out, he write a similar library for integrating Twisted
with those.  He also wrote a utility generally referred to as "defgen", and
James Knight updated it to take advantage of the Python 2.5 changes to
generators.

Through all of that, however, Twisted is still taking care of all of the nitty
gritty platform details.  Whether one uses stackless or greenlets or generators
or any other mechanism, it is important to realize that the lexical structure
of the code is not inherently tied to the networking library in use.

If you want to write code in the style of the above for loop, you can do so
with Twisted and stackless.  The problems involved are much the same as those
involving urllib2/stackless, but at least Twisted is prepared to deal with
context switching around network events, so any bugs you encounter are likely
to be due to mistaken assumptions in your own application code. :)

For what it's worth, I prefer context switches to be explicit in the style of
continuation passing so that I am less likely to introduce such bugs into my
own code.  This is, however, entirely at my discretion, and I am not about to
force anyone else to develop their applications this way.

>Yet by using the Stackless socket monkeypatch, this
>same code works in an async framework.  And the underlying
>libraries have a much larger developer base than Twisted.
>Want NNTP?  "import nntplib"  Want POP3?  "import poplib"
>Plenty of documentation about them too.

This is going to come out pretty harshly, for which I can only apologize in
advance, but it bears mention.  The quality of protocol implementations in the
standard library is bad.  As in "not good".  Twisted's NNTP support is better
(even if I do say so myself - despite only having been working on by myself,
when I knew almost nothing about Twisted, and having essentially never been
touched since).  Twisted's POP3 support is fantastically awesome.  Next to
imaplib, twisted.mail.imap4 is a sparkling diamond.  And each of these
implements the server end of the protocol as well: you won't find that in the
standard library for almost any protocol.

As for the documentation, please compare these two pages:

http://python.org/doc/lib/pop3-objects.html
http://twistedmatrix.com/documents/current/api/twisted.mail.pop3.AdvancedPOP3Client.html

I think it is fair to call them comparable.  They could both stand some
improvement, really. :) And if someone wants to argue that, if the POP3 client
from Twisted is going to be added to the standard library, its documentation
should be improved first, I'm not going to argue against that.  Docs are great,
more docs are greater.

But let's bear in mind that at present, no one has suggested adding anything
but the core Twisted event loop.

>
>All the earlier quotes were lifted from glyph.  Here's another:
>>  When you boil it down, Twisted's event loop is just a
>>  notification for "a connection was made", "some data was
>>  received on a connection", "a connection was closed", and
>>  a few APIs to listen or initiate different kinds of
>>  connections, start timed calls, and communicate with threads.
>>  All of the platform details of how data is delivered to the
>>  connections are abstracted away..  How do you propose we
>>  would make a less "specific" event mechanism?
>
>What would I need to do to extract this Twisted core so
>I could replace asyncore?  I know at minimum I need
>"twisted.internet" and "twisted.python" (the latter for
>logging) and "twisted.persisted" for "styles.Ephemeral".

Neither of those dependencies is a very hard one.  I suspect that there would
be resistence to adding a new logging system to the standard library, just for
Twisted.  You have the right idea though.  Some portion of twisted.internet,
and whatever utility code it depends on.

>
>But I say this hesitantly recalling the frustrations
>I had in dealing with a connection error in Twisted,
>described in the aforementioned link
> http://www.dalkescientific.com/writings/diary/archive/2006/08/28/levels_of_abstraction.html
>
>I feel that using the phrase "just a" in the previously quoted
>text is an understatement.

I think you're right.  We throw around "just" a lot in our line of work, don't
we? :) Twisted does also account for a raft of platform-specific quirks and
inconsistencies.  I take this to be a good thing.

>While the mechanics might be
>simple, there are many, many layers, as you can see in this
>stack trace.
>
>  File "async_blast.py", line 55, in ?
>    reactor.run()
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/internet/posixbase.py", line 218, in run
>    self.mainLoop()
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/internet/posixbase.py", line 229, in mainLoop
>    self.doIteration(t)
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/internet/selectreactor.py", line 133, in doSelect
>    _logrun(selectable, _drdw, selectable, method, dict)
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/python/log.py", line 53, in callWithLogger
>    return callWithContext({"system": lp}, func, *args, **kw)
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/python/log.py", line 38, in callWithContext
>    return context.call({ILogContext: newCtx}, func, *args, **kw)
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/python/context.py", line 59, in callWithContext
>    return self.currentContext().callWithContext(ctx, func, *args, **kw)
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/python/context.py", line 37, in callWithContext
>    return func(*args,**kw)
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/internet/selectreactor.py", line 139, in _doReadOrWrite
>    why = getattr(selectable, method)()
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/internet/tcp.py", line 535, in doConnect
>    self.failIfNotConnected(error.getConnectError((connectResult,
>os.strerror(connectResult))))
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/internet/error.py", line 160, in getConnectError
>    return klass(number, string)
>  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
>site-packages/twisted/internet/error.py", line 105, in __init__
>    traceback.print_stack()
>
>That feels like 6 layers too many, given that
>  _logrun(selectable, _drdw, selectable, method, dict)
>  return context.call({ILogContext: newCtx}, func, *args, **kw)
>  return self.currentContext().callWithContext(ctx, func, *args, **kw)
>  return func(*args, **kw)
>  getattr(selectable, method())
>  klass(number, string)
>
>are all generic calls.

I know function calls are expensive in Python, and method calls even more
so... but I still don't understand this issue.  Twisted's call stack is too
deep?  It is fair to say it is deep, I guess, but I don't see how that is a
problem.  If it is, I don't see how it is specific to this discussion.

>(Note that I argued against the
>twisted.internet.error way of doing thing as it changed my
>error number on me and gave me a non-system-standard, non-i18n
>error message.)

Note that we ended up /not/ changing the error number in the case you
encountered.  We changed the connection setup code to handle the unexpected
behavior on OS X. :) Twisted is faithfully reporting the same errno as the
underlying platform is producing.  Since most applications don't know or care
about such things though, it is also putting it into an exception instance
which indicates the category into which the error falls.  These all seem like
good things to me.

>
>I do not think Twisted can be changed to be an async
>kernel of the sort I would like without making enough
>changes as to be incompatible with the existing code.
>
>Also, and I say this to stress the difficulties of an outsider
>in using Twisted, I don't understand what's meant by "IProtocol" in
>>  At the very least, standardizing on something very much like
>>  IProtocol would go a long way towards making it possible to
>>  write async clients and servers

It is exactly these interfaces which make it possible to make changes to
Twisted without breaking things.  The behaviour of the APIs exposed by Twisted
is defined.  Even if IProtocol is not adopted verbatim, the existence of
IProtocol and another interface means one can be adapted to the other in some
manner, providing compatibility for existing applications.

>
>There are 37 pages (according to Google) in the twistedmatrix domain
>which talk about IProtocol and are not "API docs" or part of a ticket.
>
>  IProtocol site:twistedmatrix.com -"API docs" -"twisted-commits"
>
>None provided insight.  The API doc is at
>  http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IProtocol.html
>

Since the standard library lacks interfaces, it may be the case that IProtocol
is instead adopted as an ABC or even that it won't appear in code at all, but
instead be translated into non-source documentation.  I'd prefer if z.i were in
the stdlib, but that's a separate issue.  What Glyph is saying when he talks
about standardizing IProtocol is the standardization of an API, nothing more.
Which is what this whole thread is about, if I am not mistaken.

I apologize for writing such a long message, but I didn't have time to write a
shorter one.

Jean-Paul


More information about the Python-Dev mailing list