Solving the import-deadlock case

Hello everyone, I'd like to bring your attention to this issue, since it touches the fundamentals of python's import workflow: http://bugs.python.org/issue17716 /I've tried to post it on the python-import ML for weeks, but it must still be blocked somewhere in a moderation queue, so here I come ^^/ TLDR version: because of the way import current works, if importing a package "temporarily" fails whereas importing one of its children succeeded, we reach an unusable state, all subsequent attempts at importing that package will fail if a "from...import" is used somewhere. Typically, it makes a web worker broken, even though the typical behaviour of such process woudl be to retry loading, again and again, the failing view. I agree that a module loading should be, as much as possible, "side effects free", and thus shouldn't have temporary errors. But well, in practice, module loading is typically the time where process-wide initialization are done (modifying sys.path, os.environ, instantiating connection or thread pools, registering atexit handler, starting maintenance threads...), so that case has chances to happen at a moment or another, especially if accesses to filesystem or network (SQL...) are done at module loading, due to the lack of initialization system at upper levels. That's why I propose modifying the behaviour of module import, so that submodules are deleted as well when a parent module import fails. True, it means they will be reloaded as well when importing the parent will start again, but anyway we already have a "double execution" problem with the reloading of the parent module, so it shouldn't make a big difference. The only other solution I'd see would be to SYSTEMATICALLY perform name (re)binding when processing a from...import statement, to recover from the previously failed initialization. Dunno if it's a good idea. On a (separate but related) topic, to be safer on module reimports or reloadings, it could be interesting to add some "idempotency" to common initialization tasks ; for example the "atexit" registration system, wouldn't it be worth adding a boolean flag to explicitely skip registration if a callable with same fully qualified name is already registered. Do you have opinions on these subjects ? thanks, regards, Pascal

On Tue, 02 Jul 2013 20:31:48 +0200, Pascal Chambon <pythoniks@gmail.com> wrote:
There may well be a bug that could be/should be fixed here, but...it seems to me that other than the sys.path modifications, doing any of that at module import time has a strong code smell. --David

On 3 Jul 2013 05:44, "R. David Murray" <rdmurray@bitdance.com> wrote:
On Tue, 02 Jul 2013 20:31:48 +0200, Pascal Chambon <pythoniks@gmail.com>
wrote:
Unfortunately it's one of those "Your code is dubious, but so many people do it anyway we should handle it better than we do" cases. We could also be a lot more emphatic about "import side effects are what marks the boundary between a library and a framework. To stay on the library side of that fence provide a 'start' or 'configure' function instead of doing things implicitly on import". Heck, even *defining* library, framework and application would be a good thing. OTOH, it's hard to find motivation to work on improving the handling of things you think people shouldn't be doing in the first place (that's one of the reasons circular import handling has never been made more consistent). (That's not to dismiss the work Pascal's already done - just pointing out why it may sometimes feel like it's difficult to get interest and feedback on things like this). Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 3 Jul 2013 04:34, "Pascal Chambon" <pythoniks@gmail.com> wrote:
Hello everyone,
I'd like to bring your attention to this issue, since it touches the
fundamentals of python's import workflow: package "temporarily" fails whereas importing one of its children succeeded, we reach an unusable state, all subsequent attempts at importing that package will fail if a "from...import" is used somewhere. Typically, it makes a web worker broken, even though the typical behaviour of such process woudl be to retry loading, again and again, the failing view.
I agree that a module loading should be, as much as possible, "side
effects free", and thus shouldn't have temporary errors. But well, in practice, module loading is typically the time where process-wide initialization are done (modifying sys.path, os.environ, instantiating connection or thread pools, registering atexit handler, starting maintenance threads...), so that case has chances to happen at a moment or another, especially if accesses to filesystem or network (SQL...) are done at module loading, due to the lack of initialization system at upper levels.
That's why I propose modifying the behaviour of module import, so that
The only other solution I'd see would be to SYSTEMATICALLY perform name (re)binding when processing a from...import statement, to recover from the
submodules are deleted as well when a parent module import fails. True, it means they will be reloaded as well when importing the parent will start again, but anyway we already have a "double execution" problem with the reloading of the parent module, so it shouldn't make a big difference. previously failed initialization. Dunno if it's a good idea.
On a (separate but related) topic, to be safer on module reimports or
reloadings, it could be interesting to add some "idempotency" to common initialization tasks ; for example the "atexit" registration system, wouldn't it be worth adding a boolean flag to explicitely skip registration if a callable with same fully qualified name is already registered.
Do you have opinions on these subjects ?
Back on topic... As I stated on the issue, I think purging the whole subtree when a package implicitly imports child modules is the least bad of the available options, and better than leaving the child modules in place in violation of the "all parent packages can be assumed to be in sys.modules" invariant (which is what we do now). Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Thanks for the comments, in my particular case we're actually on a provisioning /framework/, so we chose the easy (lazy?) way, i.e initializing miscellaneous modules at loading times (like Django or others do, I think), rather than building an proper initialization dispatcher to be called from eg. a wsgi launcher. It works pretty well actually, except that nasty (but fortunately very rare) import deadlock. ^^ Since module loading errors *might* occur for tons of reasons (i.e searching the disk for py files already IS a side effect...), and since the current behaviour (letting children module survive disconnected from their parent) is more harmful than useful, I guess that the cleanup that Nick evocated iwould be the path to follow, wouldn't it ? thanks, Regards, Pascal Le 02/07/2013 23:32, Nick Coghlan a écrit :

On Tue, 02 Jul 2013 20:31:48 +0200, Pascal Chambon <pythoniks@gmail.com> wrote:
There may well be a bug that could be/should be fixed here, but...it seems to me that other than the sys.path modifications, doing any of that at module import time has a strong code smell. --David

On 3 Jul 2013 05:44, "R. David Murray" <rdmurray@bitdance.com> wrote:
On Tue, 02 Jul 2013 20:31:48 +0200, Pascal Chambon <pythoniks@gmail.com>
wrote:
Unfortunately it's one of those "Your code is dubious, but so many people do it anyway we should handle it better than we do" cases. We could also be a lot more emphatic about "import side effects are what marks the boundary between a library and a framework. To stay on the library side of that fence provide a 'start' or 'configure' function instead of doing things implicitly on import". Heck, even *defining* library, framework and application would be a good thing. OTOH, it's hard to find motivation to work on improving the handling of things you think people shouldn't be doing in the first place (that's one of the reasons circular import handling has never been made more consistent). (That's not to dismiss the work Pascal's already done - just pointing out why it may sometimes feel like it's difficult to get interest and feedback on things like this). Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 3 Jul 2013 04:34, "Pascal Chambon" <pythoniks@gmail.com> wrote:
Hello everyone,
I'd like to bring your attention to this issue, since it touches the
fundamentals of python's import workflow: package "temporarily" fails whereas importing one of its children succeeded, we reach an unusable state, all subsequent attempts at importing that package will fail if a "from...import" is used somewhere. Typically, it makes a web worker broken, even though the typical behaviour of such process woudl be to retry loading, again and again, the failing view.
I agree that a module loading should be, as much as possible, "side
effects free", and thus shouldn't have temporary errors. But well, in practice, module loading is typically the time where process-wide initialization are done (modifying sys.path, os.environ, instantiating connection or thread pools, registering atexit handler, starting maintenance threads...), so that case has chances to happen at a moment or another, especially if accesses to filesystem or network (SQL...) are done at module loading, due to the lack of initialization system at upper levels.
That's why I propose modifying the behaviour of module import, so that
The only other solution I'd see would be to SYSTEMATICALLY perform name (re)binding when processing a from...import statement, to recover from the
submodules are deleted as well when a parent module import fails. True, it means they will be reloaded as well when importing the parent will start again, but anyway we already have a "double execution" problem with the reloading of the parent module, so it shouldn't make a big difference. previously failed initialization. Dunno if it's a good idea.
On a (separate but related) topic, to be safer on module reimports or
reloadings, it could be interesting to add some "idempotency" to common initialization tasks ; for example the "atexit" registration system, wouldn't it be worth adding a boolean flag to explicitely skip registration if a callable with same fully qualified name is already registered.
Do you have opinions on these subjects ?
Back on topic... As I stated on the issue, I think purging the whole subtree when a package implicitly imports child modules is the least bad of the available options, and better than leaving the child modules in place in violation of the "all parent packages can be assumed to be in sys.modules" invariant (which is what we do now). Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Thanks for the comments, in my particular case we're actually on a provisioning /framework/, so we chose the easy (lazy?) way, i.e initializing miscellaneous modules at loading times (like Django or others do, I think), rather than building an proper initialization dispatcher to be called from eg. a wsgi launcher. It works pretty well actually, except that nasty (but fortunately very rare) import deadlock. ^^ Since module loading errors *might* occur for tons of reasons (i.e searching the disk for py files already IS a side effect...), and since the current behaviour (letting children module survive disconnected from their parent) is more harmful than useful, I guess that the cleanup that Nick evocated iwould be the path to follow, wouldn't it ? thanks, Regards, Pascal Le 02/07/2013 23:32, Nick Coghlan a écrit :
participants (4)
-
Nick Coghlan
-
Pascal Chambon
-
Pascal Chambon
-
R. David Murray