Debugging "broken pipe" (in telnetlib)

Jean-Paul Calderone exarkun at divmod.com
Tue Jul 3 12:18:31 EDT 2007


On Tue, 03 Jul 2007 08:54:25 -0700, Samuel <knipknap at gmail.com> wrote:
>On Jul 3, 3:03 pm, Jean-Paul Calderone <exar... at divmod.com> wrote:
>> EPIPE results when writing to a socket for which writing has been shutdown.
>> This most commonly occurs when the socket has closed.  You need to handle
>> this exception, since you can't absolutely prevent the socket from being
>> closed.
>
>The exception is already caught and logged, but this is really not
>good enough. By "handling this exception", do you mean that there is a
>way to handle it such that the connection still works? I found some
>code that attempts to retry when SIGPIPE was received, but this only
>results in the same error all over again.

No, the exception indicates the connection is gone.  There is no way to
continue to transfer data using it.

>Why can this not be prevented (in the general case)? Unless something
>fancy happened, what can cause the socket to close? Looking at the raw
>data received by the connected host, the connection gets lost in mid-
>stream; I can not see anything that might cause the remote side to
>close the connection (in which case I'd expect a "connection reset by
>peer" or something).

It is the nature of TCP/IP that the connection might disappear at any
moment.  The reasons for this vary from someone explicitly calling the
close or shutdown API to a wire being unplugged somewhere between the
two communicating hosts to a traffic event which results in there being
insufficient physical resources to satisfy your particular connection, and
so on and so on.

It may be the case that whatever is causing your connection to be dropped
is entirely avoidable (I can't say, since I don't know what is causing your
connection to be dropped), but all of these other causes are not avoidable,
and your program might encounter one of them someday.

>
>> There might be some other change which would be appropriate, though,
>> if it is the case that something your application is doing is causing the
>> socket to be closed (for example, sending a message which the remote side
>> decides is invalid and causing it to close the socket explicitly from its
>> end).
>
>The program is doing the same thing repeatedly and it works 95% of the
>time, so I am fairly sure that nothing special is sent.
>
>> It's difficult to make any specific suggestions in that area without
>> knowing exactly what your program does.
>
>Unfortunately the application is rather complex and a simple test case
>is not possible.

I used to bother to spend days or weeks trying to track down a subtle bug
in a complex system, but I don't anymore. ;)  It's much better to spend
that time simplifying the software which exhibits the problem until it is
simple enough to understand and make the bug obvious.  Unit testing and
test-driven development have the advantage of pressuring you to write
code which is already split into simple enough pieces that this is usually
a relatively painless process.  For systems not written with this in mind,
it can be quite unpleasant to produce a simple example, but it's ultimately
still worthwhile.

>
>Basically, it creates a number of daemon threads, each of which
>creates a (thread local, non-shared) instance of telnetlib and
>connects to a remote host. Are there any special conditions that must
>be taken care of when opening a number of sockets in threads? (The
>code runs on AIX 4.1, where Python supports native OS threads.)

Oops.  Threads.  So there's a million possible bugs.  Oops AIX, heh,
that probably introduces another million.  I don't know of anything
specifically broken in Python tied to telnetlib and threading on AIX,
no, but that just leaves you with the usual suspects.

Since I don't have any further specific advice to give you in tracking
down this problem, maybe I'll just recommend that you take a look at
Twisted, which has a better (although probably somewhat harder to grasp)
telnet library, and will let you manage numerous connections without
threads.  (Of course, if you have a large existing system then switching
to something as drastically different as Twisted might not be an option,
but it doesn't hurt to suggest it.)

Jean-Paul



More information about the Python-list mailing list