[py-dev] execnet: shutdown issues in a heavily threaded environment

Gordon Wrigley py-dev at tolomea.com
Tue Jul 14 11:56:37 CEST 2009


Hi Holger

>> Exception AttributeError: "'NoneType' object has no attribute
>> 'CHANNEL_CLOSE'" in <bound method Channel._del_ of <Channel id=7
>> open>> ignored

> Yes, some cycle is probably there.  Your data point hints that
> there is some cycle in the execnet code itself.

A bit of debugging revealed that I didn't really understand how my own
teardown setup was supposed to work and subsequently that it didn't
actually work. I have subsequently improved my teardown process and
this have greatly improved the situation, where I was previously
getting nearly 2 dozen of these per test run I now see only one, I
haven't dug into that one yet, but I'd be willing to bet that it is
also my problem.

>> and particularly it has daemon threads waiting on channel receive calls.

> Hum, that might be related -

It was.

One thing I did notice is that under the ipython shell having a daemon
thread waiting on a channel receive is enough to keep the both the
thread and execnet live and subsequently prevent the VM from exiting.
Strangely this isn't the case under the normal python shell, but still
I think there might be something worth investigating there.

import py
import threading
g=py.execnet.SshGateway("localhost")
c=g.remote_exec("while True:\n pass")
def bob():
    c.receive()
t=threading.Thread(target=bob)
t.daemon=True
t.start()
exit()

Obviously this is a retarded test case, but the only "application"
thread in the system is a daemon so I would naively expect it to shut
down.

> GC-finalization issues in MT-environments are not easy to debug :(

>From recent experience I can confirm that  that is very true.

> Might also be python-version dependent.  What are
> the differences between your test and production system?

I will check this in the morning.

> I am a bit at a loss on how to best proceed at the moment.
> I think that adding more debugging information
> to execnet and systematically implementing and
> checking scenarios is due - but quite a bit of effort.
> Could you maybe try coming up with an self-contained
> example test-script leading to these messages?

For now I'm happy to consider this my problem and keep picking at it
on my end, although since the VM does come down on both ends of the
ssh link (albeit not always cleanly) this isn't a high priority for
me, just a nuisance. If I do find anything that is definitely execnet
misbehaving I will post it here.

> On a sidenote, probably around 60% of the execnet core
> programming effort revolve around teardown/finalization issues
> - i wonder if it would be better to avoid usage of __del__
> alltogether, only have a process-wide atexit handler and
> otherwise recommend explicit calling of
> gateway.exit()/channel.close methods for proper resource
> handling.

I don't mind either way as long as there is some relatively straight
forward way of getting the whole thing shut down, so that a longer
running system that opens and closes multiple execnet connections can
be written in a way that doesn't accumulate cruft over time.

Regards Gordon



More information about the Pytest-dev mailing list