[Python-Dev] Socket timeout patch
Guido van Rossum
guido@python.org
Fri, 07 Jun 2002 16:03:10 -0400
[Jeremy, please skip forward to where it says "Stevens" or "Jeremy"
and comment.]
> Glad to hear that _fileobject works well.
I didn't say that. I was hoping it does though. :-)
> Is there any benefit to having the file code in C?
Some operations (notably pickle.dump() and .load()) only work with
real files. Other operations (e.g. printing a large list or dict) can
be faster to real files because they don't have to build the str() or
repr() of the whole thingk as a string first.
> I bet the python code isn't that much slower.
You're on. Write a benchmark. I notice that httplib uses makefile(),
and often reads very small chunks.
> It does seem a shame to have to switch
> between the two. Maybe one solution is that a makefile should set
> blocking back on if a timeout exists?
That's not very nice. It could raise an exception. But you could set
the timeout on the socket *after* calling makefile(), and then you'd
be hosed. But if we always use the Python makefile(), the problem is
solved.
> That would solve the problem
> and is a consistent change since it only checks timeout behavior (=
> 2 lines of code). I'd vote for using the python fileobject for
> both. Some profiling of the two would be nice if you have the time.
I don't, maybe you do? :-)
> > I'm also thinking of implementing the socket wrapper (which
> > is currently a Python class that has a "real" socket object as an
> > instance variable) as a subclass of the real socket class instead.
> Er, what's the difference between that and _socketobject in
> socket.py? Why not just use the python bit consistently?
Making it a subclass should be faster. But I don't know yet if it can
work -- I probably have to change the constructor at the C level to be
able to support dup() (which is the other reason for the Python
wrapper).
> Is it really so painful for apps to keep track of all their sockets
> and then do something like:
>
> for sock in sock_list:
> sock.settimeout (blah)
>
> Why keep track of them in the socket module, unless there's already code
> for this.
Steve Holden already answered this one. Also, you don't have to keep
track of all sockets -- you just have to apply the timeout if one is
set in a global variable.
> Well, we agreed to do some clean-up in a separate patch but you seem
> anxious to get it in there :)
I am relaxing now, waiting for you to pick up again.
> Well, this was the initial reason to use the selectmodule.c code.
> There's got to be a way to share code between the two for bare access
> to select, since someone else might want to use such functionality one
> day (and this has set the precendent). Why not make a small change to
> selectmodule.c that opens up the code in a C API or some sort? And
> then have select_select use that function internally.
If two modules are both shared libraries, it's really painful to share
entry points (see the interface between _ssl.c and socketmodule.c for
an example -- it's written down in a large comment block in
socketmodule.h). I really think one should be able to use select() in
more than one file. At least it works on Windows. ;-)
> > > > - I'm not sure that the handling of timeout errors in accept(),
> > > > connect() and connect_ex() is 100% correct (this code sniffs the
> > > > error code and decides whether to retry or not).
>
> This is how the original timeoutsocket.py did it and it seems
> to be the way to do blocking connects.
I understand all that. My comment is that your patch changed the
control flow in non-blocking mode too. I think I accidentally fixed
it by setting sock_timeout to -1.0 in setblocking(). But I'm not 100%
sure so I'd like this aspect to be unit-tested thoroughly.
> Ok.. Should I merge the test_timeout.py and test_socket.py as well
> then?
No, you can create as many (or as few) unit test files as you need.
> A little off-topic, while I was thinking of restructuring these
> tests, I was wondering what might be the best way to structure a unit
> test where things have to work in seperate processes/threads. What
> I'd really like to do is:
>
> * Have the setup function set up server and client sockets
> * Have the tear-down function close them
> * Have some synchronization function or simple message (this is
> socket specific)
> * Then have a test that has access to both threads and can insert
> call-backs to run in each thread.
>
> This seems tricky with the current unit-test frame work. The way I'd
> do it is using threading.Event () and have the thing block until the
> server-side and client-side respectively submit their test callbacks.
> But it might be nice to have a general class that can be added to the
> testing framework. Thoughts?
If I were you I'd worry about getting it right once first. Then we
can see if there's room for generalization. (You might want to try to
convert test_socketserver.py to your proposed framework to see how
well it works.)
> > Can you explain why on Windows you say that the socket is connected
> > when connect() returns a WSAEINVAL error?
>
> This is what timeoutsocket.py used as the unix equivalent error
> codes, and since I'm not set up to test windows and since it was
> working code, I took their word for it.
Well, but WSAEINVAL can also be returned for other conditions. See
http://msdn.microsoft.com/library/en-us/winsock/wsapiref_8m7m.asp
it seems that the *second* time you call connect() WSAEINVAL can only
mean that you're already connected. But if this socket is in a
different state, and the connect() is simply not appropriate, I don't
like the fact that connect() would simply return "success" rather than
reporting an error. E.g. I could do this:
s = socket()
s.settimeout(100)
s.connect((host, port))
.
.
.
# By mistake:
s.connect((otherhost, otherport))
I want the latter connect() to fail, but I think your code will make
it succeed.
> > Also, your code enters this block even in non-blocking mode, if a
> > timeout was also set. (Fortunately I fixed this for you by setting
> > the timeout to -1.0 in setblocking(). Unfortunately there's still a
> > later test for !sock_blocking in the same block that cannot succeed
> > any more because of that.)
>
> This confusion is arising because of the restructuring of the code.
> Erm, this check applies if we have a timeout but are in non-blocking
> mode. Perhaps you changed this? To make it clearer, originally before
> the v2 of the patch, the socket was always in non-blocking mode, so
> it was necessary to check whether we were examining error code with
> non-blocking in mind, or whether we were checking for possible timeout
> behavior. Since we've changed this, it now checks if non-blocking has
> been set while a timeout has been set. Seems valid to me...
But I changed that again: setblocking() now always disables the
timeout. Read the new source in CVS.
> > The same concerns apply to connect_ex() and accept(), which have very
> > similar logic.
> >
> > I believe it is possible on some Unix variants (maybe on Linux) that
> > when select indicates that a socket is ready, if the socket is in
> > nonblocking mode, the call will return an error because some kernel
> > resource is unavailable. This suggests that you may have to keep the
> > socket in blocking mode except when you have to do a connect() or
> > accept() (for which you can't do a select without setting the socket
> > in nonblocking mode first).
>
> Not sure about this. Checking the man pages, the error codes seem
> to be the thing to check to determine what the behavior is. Perhaps
> you could clarify?
When a timeout is set, the socket file descriptor is always in
nonblocking mode. Take sock_recv() for example. It calls
internal_select(), and if that returns >= 1, it calls recv(). But
according to the Stevens books, it is still possible (under heavy
load) that the recv() returns an EWOULDBLOCK error.
(We ran into this while debugging a high-performance application based
on Spread. The select() succeeded, but the recv() failed, because the
socket was in nonblocking mode. Well, I'm *almost* sure that this was
the case -- Jeremy Hylton should know the details.)
> > Looking at connect_ex, it seems to be missing the "res = errno" bit
> > in the case where it says "we're already connected". It used to
> > return errno here, now it will return -1. Maybe the conex_finally
> > label should be moved up to before the "if (res != 0) {" line?
>
> Ah yes. I didn't look closely enough at the windows bit. On linux
> it isn't necessary. Let's move it up.
OK, done.
> > OTOH, a timeout of 0 behaves very similar to nonblocking mode --
> > similar enough that a program that uses setblocking(0) would probably
> > also work when using settimeout(0). I kind of like the idea of having
> > only a single internal flag value, sock_timeout, rather than two
> > (sock_timeout and sock_blocking).
>
> But one throws an exception and one doesn't.
What do you mean? In nonblocking mode you get an exception when the
socket isn't ready too.
> It seems to me that
> setting a timeout of 0 is sort of an error, if anything. It'll be an
> easy way to do a superficial test of the functionality in the regr
> test.
OK, we don't seem to be able to agree on this. I'll let your wisdom
prevail.
> > The tests don't look very systematic. There are many cases (default
> > bufsize, unbuffered, bufsize=1, large bufsize; combine with read(),
> > readline(), read a line larger than the buffer size, etc.). We need a
> > more systematic approach to unit testing here.
>
> Ok, so to recap which tests we want:
>
> * Default read()
> * Read with size given
> * unbuffered read
> * large buffer size
> * Mix read and realine
> * Do a realine
> * Do a readline larger than buffer size.
>
> Any others in the list?
Check the socket.py source code and make sure that every code path
through every method is taken at least once.
--Guido van Rossum (home page: http://www.python.org/~guido/)