[Python-Dev] Socket timeout patch

Fri, 7 Jun 2002 13:40:36 -0400

On Fri, Jun 07 @ 12:29, Guido van Rossum wrote:
> First, a few new issues in this thread:
> 
> - On Windows, setting a timeout on a socket and then using s.makefile()
>   works as expected (the I/O operations on the file will time out
>   according to the timeout set on the socket).  This is because
>   makefile() returns a pseudo-file that calls s.recv() etc. on the
>   socket object.  But on Unix, s.makefile() on a socket with a timeout
>   is a disaster: because the socket is internally set to nonblocking
>   mode, all I/O operations will fail if they cannot be completed
>   immediately (effectively setting a timeout of 0 on the file).  I have
>   currently documented around this, but maybe it would be better if
>   makefile() used a pseudo-file on all platforms, for uniform behavior.
>   Thoughts?  I'm also thinking of implementing the socket wrapper (which
>   is currently a Python class that has a "real" socket object as an
>   instance variable) as a subclass of the real socket class instead.

  Glad to hear that _fileobject works well. Is there any benefit to
having the file code in C? I bet the python code isn't that much
slower. It does seem a shame to have to switch between the two. Maybe
one solution is that a makefile should set blocking back on if a
timeout exists? That would solve the problem and is a consistent
change since it only checks timeout behavior (= 2 lines of code). I'd
vote for using the python fileobject for both. Some profiling of the
two would be nice if you have the time.

  Er, what's the difference between that and _socketobject in
socket.py? Why not just use the python bit consistently?

> - The original timeout socket code (in Python) by Tim O'Malley had a
>   global timeout which you could set so that *all* sockets
>   *automatically* had their timeout set.  This is nice if you want it
>   to affect library modules like urllib or ftplib.  That feature is
>   currently missing.  Should we add it?  (I have some concerns about
>   it, in that it might break other code -- and it doesn't seem wise to
>   do this for server-end sockets in general.  But it's a nice hack for
>   smaller programs.)

  Is it really so painful for apps to keep track of all their sockets
and then do something like:

     for sock in sock_list:
        sock.settimeout (blah)

  Why keep track of them in the socket module, unless there's already code
for this.

> >   Good stuff. The module needed a little work as I discovered as well
> > :)
> 
> ...and it still needs more.  There are still way too many #ifdefs in
> the code.

  Well, we agreed to do some clean-up in a separate patch but you seem
anxious to get it in there :)

> >   I think the tests should follow the same pattern as the
> > test_socket.py.  While adding my regression tests, I noted that the
> > general socket test suite could use some re-writing but I didn't feel
> > it appropriate to tackle it at that point. Perhaps a next patch?
> 
> Yes, please!

  Alrighty. I'll re-write the test_socket.py and do the test_timeout.py
as well.

> >   This was a concern from the beginning but we had some chat on the
> > dev list and concluded that any system supporting sockets has to
> > support select or some equivalent (hence the initial reason for using
> > the select module, although I agree it was expensive).
> 
> But that doesn't mean there aren't platform-specific tweaks necessary
> to import the definition of select() and the FD_* macros.  We'll find
> out soon enough, this is what alpha releases are for. :-)

  Well, this was the initial reason to use the selectmodule.c code.
There's got to be a way to share code between the two for bare access
to select, since someone else might want to use such functionality one
day (and this has set the precendent). Why not make a small change to
selectmodule.c that opens up the code in a C API or some sort? And
then have select_select use that function internally.

> > > - I'm not sure that the handling of timeout errors in accept(),
> > >   connect() and connect_ex() is 100% correct (this code sniffs the
> > >   error code and decides whether to retry or not).

  This is how the original timeoutsocket.py did it and it seems
to be the way to do blocking connects. You try to do the connect, check
if it happened instantaneously and then if not, do the select, and then
try again. Errno is the way to check it. That's why if we're doing timeout
stuff, there's a second call to accept/connect. Says the linux man pages:

 EAGAIN or EWOULDBLOCK
   The socket is marked non-blocking and no connections are present to
   be accepted.

> >   I've tested these on linux (manually) and they seem to work just
> > fine. I didn't do as much testing with connect_ex but the code is
> > very similar to connect, so confidence is high-er. The reason for the
> > two-pass is because the initial connect needs to be made to start the
> > process and then try again, based on the error codes, for non-blocking
> > connects. It's weird like that.
> 
> I'll wait for the unit tests.  These should test all three modes
> (blocking, non-blocking, and timeout).

  Ok.. Should I merge the test_timeout.py and test_socket.py as well
then? A little off-topic, while I was thinking of restructuring these
tests, I was wondering what might be the best way to structure a unit
test where things have to work in seperate processes/threads. What
I'd really like to do is:

  * Have the setup function set up server and client sockets
  * Have the tear-down function close them
  * Have some synchronization function or simple message (this is
    socket specific)
  * Then have a test that has access to both threads and can insert
    call-backs to run in each thread.

  This seems tricky with the current unit-test frame work. The way I'd
do it is using threading.Event () and have the thing block until the
server-side and client-side respectively submit their test callbacks.
But it might be nice to have a general class that can be added to the
testing framework. Thoughts?

> Can you explain why on Windows you say that the socket is connected
> when connect() returns a WSAEINVAL error?

  This is what timeoutsocket.py used as the unix equivalent error
codes, and since I'm not set up to test windows and since it was
working code, I took their word for it.

> Also, your code enters this block even in non-blocking mode, if a
> timeout was also set.  (Fortunately I fixed this for you by setting
> the timeout to -1.0 in setblocking(). Unfortunately there's still a
> later test for !sock_blocking in the same block that cannot succeed
> any more because of that.)

  This confusion is arising because of the restructuring of the code.
Erm, this check applies if we have a timeout but are in non-blocking
mode. Perhaps you changed this? To make it clearer, originally before
the v2 of the patch, the socket was always in non-blocking mode, so
it was necessary to check whether we were examining error code with
non-blocking in mind, or whether we were checking for possible timeout
behavior. Since we've changed this, it now checks if non-blocking has
been set while a timeout has been set. Seems valid to me...

> The same concerns apply to connect_ex() and accept(), which have very
> similar logic.
> 
> I believe it is possible on some Unix variants (maybe on Linux) that
> when select indicates that a socket is ready, if the socket is in
> nonblocking mode, the call will return an error because some kernel
> resource is unavailable.  This suggests that you may have to keep the
> socket in blocking mode except when you have to do a connect() or
> accept() (for which you can't do a select without setting the socket
> in nonblocking mode first).

  Not sure about this. Checking the man pages, the error codes seem
to be the thing to check to determine what the behavior is. Perhaps
you could clarify?

> Looking at connect_ex, it seems to be missing the "res = errno" bit
> in the case where it says "we're already connected".  It used to
> return errno here, now it will return -1.  Maybe the conex_finally
> label should be moved up to before the "if (res != 0) {" line?

  Ah yes. I didn't look closely enough at the windows bit. On linux
it isn't necessary. Let's move it up.

> >   I thought about this and whether or not I wanted to address this.  I
> > kinda decided to leave them separate though. I don't think setting a
> > timeout means anything equivalent to setblocking(0). In fact, I can't
> > see why anyone should ever set a timeout of zero and the immediate
> > throwing of the exception is a good alert as to what's going on. I
> > vote, leave them separate and as they are now...
> 
> OTOH, a timeout of 0 behaves very similar to nonblocking mode --
> similar enough that a program that uses setblocking(0) would probably
> also work when using settimeout(0).  I kind of like the idea of having
> only a single internal flag value, sock_timeout, rather than two
> (sock_timeout and sock_blocking).

  But one throws an exception and one doesn't. It seems to me that
setting a timeout of 0 is sort of an error, if anything. It'll be an
easy way to do a superficial test of the functionality in the regr
test.

> > > - The socket.py module has been changed too, changing the way
> > >   buffering is done on Windows.  I haven't reviewed or tested this
> > >   code thoroughly.
> > 
> >   I added a regression test to test_socket.py to test this, that works
> > on both the old code (I used 2.1.3) and the new code. Hopefully, this
> > will be instrumental for those testing it and it reflects my manual
> > tests.
> 
> The tests don't look very systematic.  There are many cases (default
> bufsize, unbuffered, bufsize=1, large bufsize; combine with read(),
> readline(), read a line larger than the buffer size, etc.).  We need a
> more systematic approach to unit testing here.

  Ok, so to recap which tests we want:

     * Default read()
     * Read with size given
     * unbuffered read
     * large buffer size
     * Mix read and realine
     * Do a realine
     * Do a readline larger than buffer size.

  Any others in the list?

                              -- Mike

`-> (guido)

-- 
Michael Gilfix
mgilfix@eecs.tufts.edu

For my gpg public key:
http://www.eecs.tufts.edu/~mgilfix/contact.html