[Python-Dev] Twisted Isn't Specific (was Re: Trial balloon: microthreads library in stdlib)

Thu Feb 15 23:22:28 CET 2007

> On Thu, 15 Feb 2007 10:46:05 -0500, "A.M. Kuchling" <amk at amk.ca> wrote:
> >It's hard to debug the resulting problem.  Which level of the *12*
> >levels in the stack trace is responsible for a bug?  Which of the *6*
> >generic calls is calling the wrong thing because a handler was set up
> >incorrectly or the wrong object provided?  The code is so 'meta' that
> >it becomes effectively undebuggable.

On 2/15/07, Jean-Paul Calderone <exarkun at divmod.com> wrote,
> I've debugged plenty of Twisted applications.  So it's not undebuggable. :)

Hence the word "effectively".  Or are you offering to be on-call
within 5 minutes for anyone wanting to debug code?  Not very
scalable that.

The code I was talking about took me an hour to track down
and I could only find the location be inserting a "print traceback"
call to figure out where I was.

> Application code tends to reside at the bottom of the call stack, so Python's
> traceback order puts it right where you're looking, which makes it easy to
> find.

As I also documented, Twisted tosses a lot of the call stack.  Here
is the complete and full error message I got:

Error: [Failure instance: Traceback (failure with no frames):
twisted.internet.error.ConnectionRefusedError: Connection was refused
by other side: 22: Invalid argument.
]

I wrote the essay at
  http://www.dalkescientific.com/writings/diary/archive/2006/08/28/levels_of_abstraction.html

to, among others, show just how hard it is to figure things
out in Twisted.

>  For any bug which causes something to be set up incorrectly and only
> later manifests as a traceback, I would posit that whether there is 1 frame or
> 12, you aren't going to get anything useful out of the traceback.

I posit that tracebacks are useful.

Consider:

def blah():
  make_network_request("A")
  make_network_request("B")

where "A" and "B" are encoded as part of a HTTP POST payload
to the same URI.

If there's an error in the network connection - eg, the implementation
for "B" on the server dies so the connection closes w/o a response -
then knowning that the call for "B" failed and not "A" is helpful
during debugging.

The low level error message cannot report that.  Yes, I could put
my own try blocks around everything and contextualize all of the
error messages so they are semantically correct for the given
level of code.  But that I would be a lot of code, hard to test, and
not cost effective.

> Standard practice here is just to make exception text informative,
> I think,

If you want to think of it as "exception text" then consider that
the stack trace is "just" more text for the message.

>but this is another general problem with Python programs
> and event loops, not one specific to either Twisted itself or the
> particular APIs Twisted exposes.

The thread is "Twisted Isn't Specific", as a branch of a discussion
on microthreads in the standard library.  As someone experimenting
with Stackless and how it can be used on top of an async library
I feel competent enough to comment on the latter topic.

As someone who has seen the reverse Bozo bit set by Twisted
people on everyone who makes the slightest comment towards
using any other async library, and has documented evidence as
to just why one might do so, I also feel competent enough to
comment on the branch topic.

My belief is that there are too many levels of generiticity in
Twisted.  This makes is hard for an outsider to come in and
use the system.  By "use" I include 1) understanding how the
parts go together, 2) diagnose problems and 3) adding new
features that Twisted doesn't support.

Life is trade offs.  A Twisted trade off is generiticity at the
cost of understandability.  Remember, this is all my belief,
backed by examples where I did try to understand.  My
experience with other networking packages have been much
easier, including with asyncore and Allegra.  They are not
as general purpose, but it's hard for me to believe the extra
layers in Twisted are needed to get that extra whatever
functionality.

My other belief is that async programming is hard for most
people, who would rather do "normal" programming instead
of "inside-out" programming.  Because of this 2nd belief I
am interested in something like Stackless on top of an
async library.

> As a personal anecdote, I've never once had to chase a bug through any of the
> 6 "generic calls" singled out.  I can't think of a case where I've helped any
> one else who had to do this, either.  That part of Twisted is very old, it is
> _very_ close to bug-free, and application code doesn't have very much control
> over it at all.  Perhaps in order to avoid scaring people, there should be a
> way to elide frames from a traceback (I don't much like this myself, I worry
> about it going wrong and chopping out too much information, but I have heard
> other people ask for it)?

Even though I said some of this earlier I'll elaborate for clarification.

The specific bug I was tracking down had *no* traceback.  There was
nothing to elide.  Because there was no traceback I couldn't figure
out where the error came from.  I had to use the error message text
to find the error class, from there modify the source code to generate
a traceback, then work up the stack to find the code which had the
actual error.

Here is the tail end of the traceback.

 File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
site-packages/twisted/internet/tcp.py", line 535, in doConnect
   self.failIfNotConnected(error.getConnectError((connectResult,
os.strerror(connectResult))))
 File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
site-packages/twisted/internet/error.py", line 160, in getConnectError
   return klass(number, string)
 File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
site-packages/twisted/internet/error.py", line 105, in __init__
   traceback.print_stack()

You can see the actual error occured at [-3] in the stack where
the os.strerror() was.  One layer of genericity is mapping OS-level
error codes as integers into error classes, one class per integer.

You previously said that my problem was resolved thusly:

   Note that we ended up /not/ changing the error number in the case you
   encountered.  We changed the connection setup code to handle the
   unexpected behavior on OS X. :)

This means that at least someone did help someone track a bug
which were affected by those levels of abstraction.  BTW, the ticket
is at  http://twistedmatrix.com/trac/ticket/2022
and fix was r18064.  The final solution was to

        # doConnect gets called twice.  The first time we actually need to
        # start the connection attempt.  The second time we don't really
        # want to (SO_ERROR above will have taken care of any errors, and if
        # it reported none, the mere fact that doConnect was called again is
        # sufficient to indicate that the connection has succeeded), but it
        # is not /particularly/ detrimental to do so.  This should get
        # cleaned up some day, though.

and has nothing to do with changing the error number.  Twisted was
using the 2nd error code when it should have used the 1st.  That
was the reason for my getting the "wrong" number.  It was the
right number for a different check for an error.  Note that last
comment -- the double call to doConnect was the problem, and a
source of my confusion.  It remain, just neutered.

Also note that that patch included removing code from error.py

         errno.ENETUNREACH: NoRouteError,
         errno.ECONNREFUSED: ConnectionRefusedError,
         errno.ETIMEDOUT: TCPTimedOutError,
-        # for FreeBSD - might make other unices in certain cases
-        # return wrong exception, alas
-        errno.EINVAL: ConnectionRefusedError,

which was part the mixup that gave me problems.  This definitely
was an error in one of those levels of abstraction.  It was a bad
fix earlier "fixed" by incorrectly mapping an error code, probably
on the justification of there being an OS error rather than a
Twisted implementation problem.  But that's just a wild guess
based solely on seeing other fixes of that type.

To bring this back into python-dev, .... none of this is a topic
for python-dev.  I'm reacting to what I perceive as a overly
territorial response that occurs nearly every time the words
"Twisted", "asynchronous I/O", "reactor" or "main event loop"
is uttered.  I think using microthreads/stackless/... makes
an interesting and useful alternative to the Twisted approach,
including different ways to structure the main event loop.
I think anyone who's been involved with Python and on this
list knows the work Twisted has done to understand platform
problems, and needs at most a hint to look at Twisted for
insight.  Though I feel that such insight is obscured.

That said, I resign from this thread and I'll do additional
responses in private mail.

                Andrew
                dalke at dalkescientific.com