[Python-Dev] funky buildbot problems again...

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Fri Apr 8 03:01:17 CEST 2011


On 12:07 am, janssen at parc.com wrote:
>exarkun at twistedmatrix.com wrote:
>>On 08:31 pm, janssen at parc.com wrote:
>> >My Intel Snow Leopard 2 build slave has gone into outer-space again.
>> >
>> >When I look at it, I see buildslave taking up most of a CPU (80%), 
>>and
>> >nothing much else going on.  The twistd log says:
>> >
>> >[... much omitted ...]
>> >2011-04-04 08:35:47-0700 [-] sending app-level keepalive
>> >2011-04-04 08:45:47-0700 [-] sending app-level keepalive
>> >2011-04-04 08:55:47-0700 [-] sending app-level keepalive
>> >2011-04-04 09:03:15-0700 [Broker,client] lost remote
>> >2011-04-04 09:03:15-0700 [Broker,client] lost remote
>> >2011-04-04 09:03:15-0700 [Broker,client] lost remote
>> >2011-04-04 09:03:15-0700 [Broker,client] lost remote
>> >2011-04-04 09:03:15-0700 [Broker,client] lost remote
>> > 2011-04-04 09:03:15-0700 [Broker,client] Lost connection to
>> > dinsdale.python.org:9020
>> > 2011-04-04 09:03:15-0700 [Broker,client]
>> > <twisted.internet.tcp.Connector instance at 0x101629ab8> will retry
>> > in 3 seconds
>> > 2011-04-04 09:03:15-0700 [Broker,client] Stopping factory
>> > <buildslave.bot.BotFactory instance at 0x1016299e0>
>> > 2011-04-04 09:03:18-0700 [-] Starting factory
>> > <buildslave.bot.BotFactory instance at 0x1016299e0>
>> >2011-04-04 09:03:18-0700 [-] Connecting to dinsdale.python.org:9020
>> > 2011-04-04 09:03:18-0700 [Uninitialized] Connection to
>> > dinsdale.python.org:9020 failed: Connection Refused
>> > 2011-04-04 09:03:18-0700 [Uninitialized]
>> > <twisted.internet.tcp.Connector instance at 0x101629ab8> will retry
>> > in 8 seconds
>> > 2011-04-04 09:03:18-0700 [Uninitialized] Stopping factory
>> > <buildslave.bot.BotFactory instance at 0x1016299e0>
>> > 2011-04-04 09:03:27-0700 [-] Starting factory
>> > <buildslave.bot.BotFactory instance at 0x1016299e0>
>> >2011-04-04 09:03:27-0700 [-] Connecting to dinsdale.python.org:9020
>> >
>> >So it's been spinning its wheels for 3 days.
>>
>>Does this mean that the "2011-04-04 09:03:27-0700 [-] Connecting to
>>dinsdale.python.org:9020" message in the logs is the last one you see
>>until you restart the slave?
>
>Yes, that's the last line in the file.
>>Or does it mean that the logs go on and on for three days with these
>>"Connecting to dinsdale...." / "Connection Refused" / "... will retry
>>in N seconds" cycles, thousands and thousands of times?
>
>Well, it's doing something, chewing up cycles, but there's only one
>"Connecting" line at the end of the log file.

That's very interesting.  It may be worth doing some gdb or dtrace 
investigation next time it gets into this state.
>>What does the buildmaster's info page for this slave say when the
>>slave is in this state?  In particular, what does it say about
>>"connects/hour"?
>
>Ah, good question.  Too bad I restarted the slave after I sent out my
>info.  Is there some way to recover that from earlier?  If not, it will
>undoubtedly fail again in a few days.

If the master logs are available, that would provide some information. 
Otherwise, I think waiting for it to happen again is the thing to do.

Since there were no other messages in the log file, I expect the 
connects/hour value will be low - perhaps 0.

Jean-Paul


More information about the Python-Dev mailing list