select() call and filedescriptor out of range in select error

k3xji sumerc at gmail.com
Thu Sep 16 08:51:34 CEST 2010


> Please show the *exact* error message, including the traceback, by
> copying and pasting it. Do not retype it by hand, or summarize it, or put
> it into your own words.

Unfortunately this is not possible. The logging system I designed only
gives the following information, as we have millions of logs per-day
of custom exceptions I didnot include the full traceback.Here is only
what I have:

1448) 15/09/10 20:02:08 - [*] ERROR: Physical max client limit
reached. Please contact maintenance.filedescriptor out of range in
select()[scSocketServer.py:215:][Port:515]

The code generating the error is:

                    try:
                        self.__ReadersInCycle, self.__WritersInCycle,
e = \
                            select( self.__Sockets,
self.__WritersInCycle, [],
                                 base.scOptions.scOPT_SELECT_TIMEOUT)

                    except ValueError, e:
                        LogError('Physical max client limit reached.'
\
                            ' Please contact maintenance.'+ str(e))
                        self.scSvr_OnClientPhysicalLimitReached()
#define a policy here
                        continue

> > First of all, in our entire application there is no line of code like
> > remove(x), meaning there is no x variable.
>
> Look at this example:
>
> >>> sockets = []
> >>> sockets.remove("Hello world")
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: list.remove(x): x not in list
>

Ok. Thanks.

> Anything is possible, but it's not likely. What's far more likely is that
> you have a bug in your code, and that somehow, under rare circumstances,
> it tries to remove something from a list that was never inserted into the
> list. Or it tries to remove it twice.
>
> My guess is something like this:
>
> try:
>     socket = get_socket()
>     self._sockets.append(socket)
> except SomeError:
>     pass
> # later on
> self._sockets.remove(socket)
>

Hmm.. Might be, but inside the self.__Sockets list there is the
ListenSocket() which is the real listening socket. Naturally, I am
using it in the read list of select() on every server cycle. The weird
thing is that the ListenSocket itself is throwing the "not in list"
exception, too! And one thing I am sure is that I have not written any
kind of code that removes the Listen socket from the List, that is
just impossible. Additionaly, there are very few places that I
traverse the __Sockets list for optimization. The only places I delete
something from the __Sockets list:

1) a user disconnects (normal disconnect, authentication or ping
timeout)
3) server is being stopped or restarted

Other than that there is not access to that variable from outside
objects, as can be seen it is also private. And please keep in mind
that this bug is there for about a year, so many code reviews have
passed successfully without noticing the type of error you are
suggesting.

And more information on system: I am running Python 2.4 on CentOS.

By the way, through digging the logs and system, it turns out
select(..) is hitting the per-process FD limit. Although the system
wide ulimit is unlimited, I think Python "selectmodule.c" enforces
the rule to 1024. I am getting the error after hitting that limit and
somehow as I just explained the __ListenSocket is being removed from
the read list which causes it to be lost and Server instance is just
lost forever. Putting a try..except to that code and re-init server
port is a solution but I guess a bad one, because I will have not
found the root cause.

Thanks in advance,



More information about the Python-list mailing list