RE: [Python-Dev] Fix import errors to have data

Tim Peters wrote:
Good point. This is the same as not wanting modules in an insane state in sys.modules. I've just gone back over Jim's original email, and it is a different case being talked about there - specifically, determining if an ImportError was thrown by the module that was specifically being imported, rather than something it imports. However, I don't see the utility of that. I definitely can't think of a use case for it. OTOH, ensuring that if *any* exception is thrown when importing a module, it will not appear in sys.modules (and the exception is raised each time you try to import) is IMO a very useful property, and I think would solve what I think is Jim's issue at the root. Tim Delaney

Delaney, Timothy C (Timothy) wrote:
This simplest solution for this problem would be to put the module in sys.modules only after executing the module's code finishes. Unfortunately this breaks circular imports, because it would lead to infinite recursion. This looks like we can't have both: Having a module available only after importing it works and having circular imports where one module uses another module that is only half initialized. Bye, Walter Dörwald

It's going to be difficult to ensure that nothing ever gets a reference to a broken module, because of circular imports. Suppose A imports B, which imports A. If A's initialisation subsequently fails, then even if A is removed from sys.modules, B still contains a reference to the broken A. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Yeah, but I don't mind; my patch is still a big improvement in other cases and doesn't really make things any worse in the above case (B has a reference to a broken A, just like without my patch). Did anybody look at my patch? Shall I check it in? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Well, circular imports first appeared to me like the reverse problem of circular references. The latter was only solvable by a garbage collector. Well, it's simpler after all. Circular imports, to be made consistent, would need something like a final "commit" after all imports succeeded. I think of a simple, generator-like "undoable assignment" to sys.modules, like ICON's "<-" operator. When a top-level import starts, the module gets into sys.modules, but this is also recorded as not finished. Further imports can happen, until this is all done. Then, the pending undos are cancelled, this is the "commit". If any uncaught exception occours, the latent undo is executed for all imports started in this context, and we end up with an exception and (hopefully) almost no side effects. Maybe this could be user-extensible, as on-failure-undo actions.
It is definately an improvement, much more at what people would expect, anyway.
Did anybody look at my patch? Shall I check it in?
Yes I did. +1 -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Circular imports, to be made consistent, would need something like a final "commit" after all imports succeeded.
Silly idea. Typically, circular imports aren't planned, they just happen, and (usually) just happen to work. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Yes, you name it, if you are talking about allowing circular imports without precautions and not expecting problems. Like circular references which aren't planned, they just happen, but they truely work fine after GC was introduced. -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 want to see a silly guy's work? http://www.stackless.com/

I thought Christian meant this was something that would be done automatically, not something the user would have to do explicitly. Maybe it could be as simple as saving a snapshot of sys.modules whenever importing of a module is begun, and if execution of its body doesn't complete, restoring the snapshot? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

That might be over-cautious. It's probably safe to keep modules whose body code executed without error.
No, it's not. If A and B import each other, and A fails after importing B, then B ends up with a reference to a broken A even though it executed without error. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

At 04:49 PM 8/2/04 +1200, Greg Ewing wrote:
Argh. I can't believe I missed that. Now that you mention it, I think that point was already brought up just a few days ago here. Maybe even by you. :) Ah well, please pardon my sudden case of obliviousness. :)

Greg Ewing wrote:
This is exactly what I thought of. It seems to be correct, also with nested imports, since this is a stack-like structure of undoable things. Example __main__: import A #1 #6 A: try: import B #2 #5 except (ImportError, maybe others): # provide surrogate for B # other action that fails B: import A #3 import C #4 # do some stuff that fails #1: snapshot of sys.modules, A added [__main__, A] #2: snapshot of sys.modules, B added [__main__, A, B] #3: snapshot of sys.modules, A exists [__main__, A, B] #3: snapshot of sys.modules, C added [__main__, A, B, C] after failing imports: #5: restore [__main__, A] #5: restore [__main__] So one side effect is that completely valid imports in the context of a failing import are undone. I guess this is intended. A problem are side effects, of course. It would be nice to have a cleanup construct for completely or partially imported modules, to undo any side effects. How do we remove extension modules which have been loaded as a side effect? ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

What happens in the presence of threads, though? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing wrote:
To my memory, imports in the presense of threads *always* have been a nightmare. My advice always goues this way: If you must have threads, make sure to import everything through the main thread, before starting new ones. If you then still have problems, well, I have a different solution, which is also more efficient,... ... but this is another story. Off topic. -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

OK, I misunderstood then. Apologies to Christian.
That sounds like overkill -- any modules that don't participate in the cycle but happen to be imported by one of the modules in the cycle would also be deleted. I still expect that the solution that Tim just checked in (delete any module that failed to complete execution) will solve most of the problems encountered in real life. I'm aware of the case you describe here:
But I don't lose sleep over it; this would only be a problem if the main code attempts to import A but suppresses the import error *and* also uses B directly. This doesn't seem to be a likely scenario (modules that can be missed rarely participate in cycles). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I love you for that one.
Yes, but that is ok, methinks. Any module that gets imported just on behalf of another module that *fails* to get imported, was just imported on behalf of that module, *not* by the user. And it is the user, finally, who has the say. He has to accept the extra imports of the successful imports, but by now means he is responsible of the side effects of imported code that he didn't write. If the user had imported that module before, he will still have it. If he had not, he also will not expect to have it now. I think it is just fine to remove what he didn't expect. The user says "I import that module now, and all what it needs". Nothing more, and nothing less. I know what I'm talking of, having paid for foreign kids for half of my life :-)
Yes, but I think you are almost an atom's distance apart off of the final solution. Grab it! ciao -- chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

[Christian Tismer] ...
Yes, but I think you are almost an atom's distance apart off of the final solution. Grab it!
If you mean this:
then no, that's a long way off in CPython. There's no choke point for when importing begins, or for when importing ends, and even __builtin__.__import__ is routinely replaced. Guido latched on to the only choke point there is: sooner or later, every import gimmick worthy of the name has to execute "the module's" code, whether it be direct import from Python, directly via C API calls, implicit package __init__.py imports, imports arranged via magical importers (like the .zip importer), etc. So that's what the patch targeted: there's one routine that executes a module's initialization code, all imports go thru it eventually, and that's the routine that now removes the module's name from sys.modules if the initialization code barfs. "The rest" of import logic is sprayed all over creation. To hook "begin import" and "end import" first requires that all importers be changed to have a formal notion of those events. The easiest start would be to use bytecodehacks to inject "call begin_import" and "call end_import" opcode sequences around the code generated for import statements <0.8 wink>.

At 11:14 PM 8/2/04 -0400, Tim Peters wrote:
It's worse than that... 'ihooks' and 'imputil', for example, both do: exec code in module.__dict__ so they'd need to be changed to support this fix as well. (Not to mention any third-party import hooks...)

[Phillip J. Eby]
Ouch. For that matter, they also avoid the "choke point" Sunday night's patch relied on. So there are actually no choke points for any part of the import process now. Maybe after each bytecode we could make the eval loop look to see whether sys.modules had been changed <wink>.

Tim Peters wrote:
:-( Life is so cruel... I agree it is a little harder, ahem :-) -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Tim Peters <tim.peters@gmail.com>:
Couldn't that routine also save sys.modules before starting to execute the module's code? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Tim Peters]
there's one routine that executes a module's initialization code, all imports go thru it eventually,
Which isn't true. I'm hacking imputils and ihooks now.
and that's the routine that now removes the module's name from sys.modules if the initialization code barfs.
[Greg Ewing]
Couldn't that routine also save sys.modules before starting to execute the module's code?
With enough bother, sure. Go for it <wink -- but explaining subtleties is time-consuming and futile; better to find them yourself>.

Delaney, Timothy C (Timothy) wrote:
This simplest solution for this problem would be to put the module in sys.modules only after executing the module's code finishes. Unfortunately this breaks circular imports, because it would lead to infinite recursion. This looks like we can't have both: Having a module available only after importing it works and having circular imports where one module uses another module that is only half initialized. Bye, Walter Dörwald

It's going to be difficult to ensure that nothing ever gets a reference to a broken module, because of circular imports. Suppose A imports B, which imports A. If A's initialisation subsequently fails, then even if A is removed from sys.modules, B still contains a reference to the broken A. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Yeah, but I don't mind; my patch is still a big improvement in other cases and doesn't really make things any worse in the above case (B has a reference to a broken A, just like without my patch). Did anybody look at my patch? Shall I check it in? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Well, circular imports first appeared to me like the reverse problem of circular references. The latter was only solvable by a garbage collector. Well, it's simpler after all. Circular imports, to be made consistent, would need something like a final "commit" after all imports succeeded. I think of a simple, generator-like "undoable assignment" to sys.modules, like ICON's "<-" operator. When a top-level import starts, the module gets into sys.modules, but this is also recorded as not finished. Further imports can happen, until this is all done. Then, the pending undos are cancelled, this is the "commit". If any uncaught exception occours, the latent undo is executed for all imports started in this context, and we end up with an exception and (hopefully) almost no side effects. Maybe this could be user-extensible, as on-failure-undo actions.
It is definately an improvement, much more at what people would expect, anyway.
Did anybody look at my patch? Shall I check it in?
Yes I did. +1 -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Circular imports, to be made consistent, would need something like a final "commit" after all imports succeeded.
Silly idea. Typically, circular imports aren't planned, they just happen, and (usually) just happen to work. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Yes, you name it, if you are talking about allowing circular imports without precautions and not expecting problems. Like circular references which aren't planned, they just happen, but they truely work fine after GC was introduced. -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 want to see a silly guy's work? http://www.stackless.com/

I thought Christian meant this was something that would be done automatically, not something the user would have to do explicitly. Maybe it could be as simple as saving a snapshot of sys.modules whenever importing of a module is begun, and if execution of its body doesn't complete, restoring the snapshot? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

That might be over-cautious. It's probably safe to keep modules whose body code executed without error.
No, it's not. If A and B import each other, and A fails after importing B, then B ends up with a reference to a broken A even though it executed without error. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

At 04:49 PM 8/2/04 +1200, Greg Ewing wrote:
Argh. I can't believe I missed that. Now that you mention it, I think that point was already brought up just a few days ago here. Maybe even by you. :) Ah well, please pardon my sudden case of obliviousness. :)

Greg Ewing wrote:
This is exactly what I thought of. It seems to be correct, also with nested imports, since this is a stack-like structure of undoable things. Example __main__: import A #1 #6 A: try: import B #2 #5 except (ImportError, maybe others): # provide surrogate for B # other action that fails B: import A #3 import C #4 # do some stuff that fails #1: snapshot of sys.modules, A added [__main__, A] #2: snapshot of sys.modules, B added [__main__, A, B] #3: snapshot of sys.modules, A exists [__main__, A, B] #3: snapshot of sys.modules, C added [__main__, A, B, C] after failing imports: #5: restore [__main__, A] #5: restore [__main__] So one side effect is that completely valid imports in the context of a failing import are undone. I guess this is intended. A problem are side effects, of course. It would be nice to have a cleanup construct for completely or partially imported modules, to undo any side effects. How do we remove extension modules which have been loaded as a side effect? ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

What happens in the presence of threads, though? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing wrote:
To my memory, imports in the presense of threads *always* have been a nightmare. My advice always goues this way: If you must have threads, make sure to import everything through the main thread, before starting new ones. If you then still have problems, well, I have a different solution, which is also more efficient,... ... but this is another story. Off topic. -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

OK, I misunderstood then. Apologies to Christian.
That sounds like overkill -- any modules that don't participate in the cycle but happen to be imported by one of the modules in the cycle would also be deleted. I still expect that the solution that Tim just checked in (delete any module that failed to complete execution) will solve most of the problems encountered in real life. I'm aware of the case you describe here:
But I don't lose sleep over it; this would only be a problem if the main code attempts to import A but suppresses the import error *and* also uses B directly. This doesn't seem to be a likely scenario (modules that can be missed rarely participate in cycles). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I love you for that one.
Yes, but that is ok, methinks. Any module that gets imported just on behalf of another module that *fails* to get imported, was just imported on behalf of that module, *not* by the user. And it is the user, finally, who has the say. He has to accept the extra imports of the successful imports, but by now means he is responsible of the side effects of imported code that he didn't write. If the user had imported that module before, he will still have it. If he had not, he also will not expect to have it now. I think it is just fine to remove what he didn't expect. The user says "I import that module now, and all what it needs". Nothing more, and nothing less. I know what I'm talking of, having paid for foreign kids for half of my life :-)
Yes, but I think you are almost an atom's distance apart off of the final solution. Grab it! ciao -- chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

[Christian Tismer] ...
Yes, but I think you are almost an atom's distance apart off of the final solution. Grab it!
If you mean this:
then no, that's a long way off in CPython. There's no choke point for when importing begins, or for when importing ends, and even __builtin__.__import__ is routinely replaced. Guido latched on to the only choke point there is: sooner or later, every import gimmick worthy of the name has to execute "the module's" code, whether it be direct import from Python, directly via C API calls, implicit package __init__.py imports, imports arranged via magical importers (like the .zip importer), etc. So that's what the patch targeted: there's one routine that executes a module's initialization code, all imports go thru it eventually, and that's the routine that now removes the module's name from sys.modules if the initialization code barfs. "The rest" of import logic is sprayed all over creation. To hook "begin import" and "end import" first requires that all importers be changed to have a formal notion of those events. The easiest start would be to use bytecodehacks to inject "call begin_import" and "call end_import" opcode sequences around the code generated for import statements <0.8 wink>.

At 11:14 PM 8/2/04 -0400, Tim Peters wrote:
It's worse than that... 'ihooks' and 'imputil', for example, both do: exec code in module.__dict__ so they'd need to be changed to support this fix as well. (Not to mention any third-party import hooks...)

[Phillip J. Eby]
Ouch. For that matter, they also avoid the "choke point" Sunday night's patch relied on. So there are actually no choke points for any part of the import process now. Maybe after each bytecode we could make the eval loop look to see whether sys.modules had been changed <wink>.

Tim Peters wrote:
:-( Life is so cruel... I agree it is a little harder, ahem :-) -- Christian Tismer :^) <mailto:tismer@stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Tim Peters <tim.peters@gmail.com>:
Couldn't that routine also save sys.modules before starting to execute the module's code? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Tim Peters]
there's one routine that executes a module's initialization code, all imports go thru it eventually,
Which isn't true. I'm hacking imputils and ihooks now.
and that's the routine that now removes the module's name from sys.modules if the initialization code barfs.
[Greg Ewing]
Couldn't that routine also save sys.modules before starting to execute the module's code?
With enough bother, sure. Go for it <wink -- but explaining subtleties is time-consuming and futile; better to find them yourself>.
participants (7)
-
Christian Tismer
-
Delaney, Timothy C (Timothy)
-
Greg Ewing
-
Guido van Rossum
-
Phillip J. Eby
-
Tim Peters
-
Walter Dörwald