Peculiar import code in pickle.py

When pickle.py needs to import a module by name, it goes through a peculiar dance of __import__(module, level=0) mod = sys.modules[module] As far as I can tell, unless builtins.__import__ is overridden or sys.modules clobbered by user code, the above should be equivalent to mod = __import__(module, level=0) Note that the optimized _pickle implementation does not do the sys.modules lookup and simply uses the module returned by __import__(..). This code goes back to 1999, so there was probably a good reason back then to write it this way. Presently, however, it seems to be just another obscure difference between C and Python implementation of pickle.

On Tue, 13 Jul 2010 11:25:23 -0400 Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
When pickle.py needs to import a module by name, it goes through a peculiar dance of
__import__(module, level=0) mod = sys.modules[module]
As far as I can tell, unless builtins.__import__ is overridden or sys.modules clobbered by user code, the above should be equivalent to
mod = __import__(module, level=0)
Only for top-level modules:
__import__("distutils.core", level=0) <module 'distutils' from '/home/antoine/py3k/__svn__/Lib/distutils/__init__.py'> sys.modules["distutils.core"] <module 'distutils.core' from '/home/antoine/py3k/__svn__/Lib/distutils/core.py'>
Regards Antoine.

On Tue, Jul 13, 2010 at 11:34 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 13 Jul 2010 11:25:23 -0400 .. Only for top-level modules:
__import__("distutils.core", level=0) <module 'distutils' from '/home/antoine/py3k/__svn__/Lib/distutils/__init__.py'> sys.modules["distutils.core"] <module 'distutils.core' from '/home/antoine/py3k/__svn__/Lib/distutils/core.py'>
That's right, but I believe the recommended way to achieve that behavior is to supply a dummy fromlist:
__import__("distutils.core", fromlist=["dummy"], level=0) <module 'distutils.core' from '/Users/sasha/Work/python-svn/py3k/Lib/distutils/core.py'>
That's what C implementation does AFAICT.

On 13/07/2010 16:46, Alexander Belopolsky wrote:
On Tue, Jul 13, 2010 at 11:34 AM, Antoine Pitrou<solipsis@pitrou.net> wrote:
On Tue, 13 Jul 2010 11:25:23 -0400
..
Only for top-level modules:
__import__("distutils.core", level=0)
<module 'distutils' from '/home/antoine/py3k/__svn__/Lib/distutils/__init__.py'>
sys.modules["distutils.core"]
<module 'distutils.core' from '/home/antoine/py3k/__svn__/Lib/distutils/core.py'>
That's right, but I believe the recommended way to achieve that behavior is to supply a dummy fromlist:
__import__("distutils.core", fromlist=["dummy"], level=0)
<module 'distutils.core' from '/Users/sasha/Work/python-svn/py3k/Lib/distutils/core.py'>
That's what C implementation does AFAICT.
I find the "little dance" much more readable. All the best, Michael
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

2010/7/13 Alexander Belopolsky <alexander.belopolsky@gmail.com>:
On Tue, Jul 13, 2010 at 11:34 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 13 Jul 2010 11:25:23 -0400 .. Only for top-level modules:
__import__("distutils.core", level=0) <module 'distutils' from '/home/antoine/py3k/__svn__/Lib/distutils/__init__.py'> sys.modules["distutils.core"] <module 'distutils.core' from '/home/antoine/py3k/__svn__/Lib/distutils/core.py'>
That's right, but I believe the recommended way to achieve that behavior is to supply a dummy fromlist:
__import__("distutils.core", fromlist=["dummy"], level=0) <module 'distutils.core' from '/Users/sasha/Work/python-svn/py3k/Lib/distutils/core.py'>
No! That's not recommended and a complete hack. The "dance" or importlib.import_module is preferred. -- Regards, Benjamin

On Tue, Jul 13, 2010 at 1:57 PM, Benjamin Peterson <benjamin@python.org> wrote:
2010/7/13 Alexander Belopolsky <alexander.belopolsky@gmail.com>:
On Tue, Jul 13, 2010 at 11:34 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 13 Jul 2010 11:25:23 -0400 .. Only for top-level modules:
__import__("distutils.core", level=0) <module 'distutils' from '/home/antoine/py3k/__svn__/Lib/distutils/__init__.py'> sys.modules["distutils.core"] <module 'distutils.core' from '/home/antoine/py3k/__svn__/Lib/distutils/core.py'>
That's right, but I believe the recommended way to achieve that behavior is to supply a dummy fromlist:
__import__("distutils.core", fromlist=["dummy"], level=0) <module 'distutils.core' from '/Users/sasha/Work/python-svn/py3k/Lib/distutils/core.py'>
No! That's not recommended and a complete hack. The "dance" or importlib.import_module is preferred.
A complete hack with a long pedigree: module = __import__(modname, None, None, 'python2.4 is silly, revisit this line in 2.5') I think that line in a code base of mine didn't get altered until 2.6.something. Hack-ily, -Jack

On Tue, Jul 13, 2010 at 1:57 PM, Benjamin Peterson <benjamin@python.org> wrote: ..
No! That's not recommended and a complete hack. The "dance" or importlib.import_module is preferred.
Nevertheless, "a complete hack" is what PyImport_Import does: PyObject * PyImport_Import(PyObject *module_name) { static PyObject *silly_list = NULL; .. /* Call the __import__ function with the proper argument list * Always use absolute import here. */ r = PyObject_CallFunction(import, "OOOOi", module_name, globals, globals, silly_list, 0, NULL); .. } and _pickle.c uses PyImport_Import() and thus is different form pickle.py which uses the double-lookup dance. As a result, the two implementations are subtly different. They cannot be both right. It should be easy to "fix" _pickle.c to do the sys.modules lookup, but I am not sure this is right.

On Tue, Jul 13, 2010 at 11:34, Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:
On Tue, Jul 13, 2010 at 1:57 PM, Benjamin Peterson <benjamin@python.org> wrote: ..
No! That's not recommended and a complete hack. The "dance" or importlib.import_module is preferred.
Nevertheless, "a complete hack" is what PyImport_Import does:
PyObject * PyImport_Import(PyObject *module_name) { static PyObject *silly_list = NULL; .. /* Call the __import__ function with the proper argument list * Always use absolute import here. */ r = PyObject_CallFunction(import, "OOOOi", module_name, globals, globals, silly_list, 0, NULL); .. }
and _pickle.c uses PyImport_Import() and thus is different form pickle.py which uses the double-lookup dance. As a result, the two implementations are subtly different. They cannot be both right. It should be easy to "fix" _pickle.c to do the sys.modules lookup, but I am not sure this is right.
Pulling from sys.modules is the correct way to do this. There are subtle issues when using a bunk fromlist argument (empty modules, double initialization, etc.). If one does not use importlib.import_module -- written *specifically* to prevent people from doing the nasty hack with the fromlist -- then you should use the sys.modules approach, period. If import.c is not doing this then it should get fixed. You can assign me the issue if you want. I say this every time I give an import talk and it has been brought up here before but obviously not everyone catches it (which is understandable as I think when it came up on python-dev it was at the tail end of a discussion), so I am just going to repeat myself: Do not put junk in fromlist if you call __import__ directly! Use importlib.import_module! Or if you have a *really* good reason to not use it, then use ``__import__(name); module = sys.modules[name]``. I have stopped fixing bugs related to this in import.c because of the annoying issues it causes and I expect the correct approach to gain traction at some point (plus get importlib bootstrapped in so I don't have to care about import.c anymore).

On Tue, Jul 13, 2010 at 4:52 PM, Brett Cannon <brett@python.org> wrote: ..
Pulling from sys.modules is the correct way to do this. There are subtle issues when using a bunk fromlist argument (empty modules, double initialization, etc.). If one does not use importlib.import_module -- written *specifically* to prevent people from doing the nasty hack with the fromlist -- then you should use the sys.modules approach, period. If import.c is not doing this then it should get fixed. You can assign me the issue if you want.
Please see http://bugs.python.org/issue9252 .

Brett Cannon wrote:
On Tue, Jul 13, 2010 at 11:34, Alexander Belopolsky <alexander.belopolsky@gmail.com <mailto:alexander.belopolsky@gmail.com>> wrote:
On Tue, Jul 13, 2010 at 1:57 PM, Benjamin Peterson <benjamin@python.org <mailto:benjamin@python.org>> wrote: .. > No! That's not recommended and a complete hack. The "dance" or > importlib.import_module is preferred.
Nevertheless, "a complete hack" is what PyImport_Import does:
PyObject * PyImport_Import(PyObject *module_name) { static PyObject *silly_list = NULL; .. /* Call the __import__ function with the proper argument list * Always use absolute import here. */ r = PyObject_CallFunction(import, "OOOOi", module_name, globals, globals, silly_list, 0, NULL); .. }
and _pickle.c uses PyImport_Import() and thus is different form pickle.py which uses the double-lookup dance. As a result, the two implementations are subtly different. They cannot be both right. It should be easy to "fix" _pickle.c to do the sys.modules lookup, but I am not sure this is right.
Pulling from sys.modules is the correct way to do this. There are subtle issues when using a bunk fromlist argument (empty modules, double initialization, etc.). If one does not use importlib.import_module -- written *specifically* to prevent people from doing the nasty hack with the fromlist -- then you should use the sys.modules approach, period. If import.c is not doing this then it should get fixed. You can assign me the issue if you want.
I say this every time I give an import talk and it has been brought up here before but obviously not everyone catches it (which is understandable as I think when it came up on python-dev it was at the tail end of a discussion), so I am just going to repeat myself:
Do not put junk in fromlist if you call __import__ directly! Use importlib.import_module! Or if you have a *really* good reason to not use it, then use ``__import__(name); module = sys.modules[name]``.
I have stopped fixing bugs related to this in import.c because of the annoying issues it causes and I expect the correct approach to gain traction at some point (plus get importlib bootstrapped in so I don't have to care about import.c anymore).
It's only a matter of time until someone decides to provide a C implementation of importlib ;-) regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/

On Wed, Jul 14, 2010 at 9:05 PM, Steve Holden <steve@holdenweb.com> wrote:
I have stopped fixing bugs related to this in import.c because of the annoying issues it causes and I expect the correct approach to gain traction at some point (plus get importlib bootstrapped in so I don't have to care about import.c anymore).
It's only a matter of time until someone decides to provide a C implementation of importlib ;-)
A C accelerated version of importlib would probably be an awful lot cleaner than the current import implementation. While the import code isn't quite the mess that we sometimes make it out to be (it does basically work after all, and most of the "problems" lie in dim dark corners that 99% of developers will never get close to), but it has definitely suffered from an accumulation of features on top of a core approach that has been pushed far beyond what it was originally designed to support. That said, I believe the limiting factor in import speed is likely to remain the number of stat calls and other filesystem operations, so it will be interesting to find out just how significant a slowdown there is between import.c and importlib. If I'm right about the real source of bottlenecks in import performance, the difference may be surprisingly small. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Jul 14, 2010 at 05:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Wed, Jul 14, 2010 at 9:05 PM, Steve Holden <steve@holdenweb.com> wrote:
I have stopped fixing bugs related to this in import.c because of the annoying issues it causes and I expect the correct approach to gain traction at some point (plus get importlib bootstrapped in so I don't have to care about import.c anymore).
It's only a matter of time until someone decides to provide a C implementation of importlib ;-)
A C accelerated version of importlib would probably be an awful lot cleaner than the current import implementation. While the import code isn't quite the mess that we sometimes make it out to be (it does basically work after all, and most of the "problems" lie in dim dark corners that 99% of developers will never get close to), but it has definitely suffered from an accumulation of features on top of a core approach that has been pushed far beyond what it was originally designed to support.
So my dream is to finally get full compatibility for importlib in 3.3 (probably won't hit 3.2 as it requires me changing marshal.loads to take a file path argument) and then try to bootstrap it in. Now bootstrapping can be done with actually a minimal amount of C code as I can simply make the bytecode for importlib a literal in C, get that loaded, and then import importlib as found on the file system to allow people to override things. But obviously I could also identify the true bottlenecks through profiling and provide acceleration where useful. Trick is being reasonable about this so as to not put other VMs at a disadvantage by making the bootstrap solution too difficult to implement.
That said, I believe the limiting factor in import speed is likely to remain the number of stat calls and other filesystem operations, so it will be interesting to find out just how significant a slowdown there is between import.c and importlib. If I'm right about the real source of bottlenecks in import performance, the difference may be surprisingly small.
So I started writing benchmark code in anticipation of needing to prove a minimal performance difference to justify bootstrapping importlib. Right now it only compares importing from sys.modules and built-in modules. You can run it with ``./python.exe -m importlib.test.benchmark``. If you add a `-b` option that will use the built-in __import__ implementation. I still need to benchmark loading source, bytecode, writing bytecode, and loading extensions. Maybe I will finish writing the benchmark code as the thing I do while at EuroPython (that and finally getting to reviewing http://bugs.python.org/issue2919 so that cProfile and profile can merge, unless someone beats me to it, in which case I would be grateful =).

On Wed, 14 Jul 2010 12:33:55 -0700 Brett Cannon <brett@python.org> wrote:
So I started writing benchmark code in anticipation of needing to prove a minimal performance difference to justify bootstrapping importlib. Right now it only compares importing from sys.modules and built-in modules. You can run it with ``./python.exe -m importlib.test.benchmark``. If you add a `-b` option that will use the built-in __import__ implementation.
In what unit are the numbers? In any case, here my results under a Linux system: $ ./python -m importlib.test.benchmark sys.modules [ 323782 326183 326667 ] best is 326667 Built-in module [ 33600 33693 33610 ] best is 33693 $ ./python -m importlib.test.benchmark -b sys.modules [ 1297640 1315366 1292283 ] best is 1315366 Built-in module [ 58180 57708 58057 ] best is 58180 Regards Antoine.

On Wed, Jul 14, 2010 at 13:01, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 14 Jul 2010 12:33:55 -0700 Brett Cannon <brett@python.org> wrote:
So I started writing benchmark code in anticipation of needing to prove a minimal performance difference to justify bootstrapping importlib. Right now it only compares importing from sys.modules and built-in modules. You can run it with ``./python.exe -m importlib.test.benchmark``. If you add a `-b` option that will use the built-in __import__ implementation.
In what unit are the numbers?
Imports/second. I'll fix the code to state that.
In any case, here my results under a Linux system:
$ ./python -m importlib.test.benchmark sys.modules [ 323782 326183 326667 ] best is 326667 Built-in module [ 33600 33693 33610 ] best is 33693
$ ./python -m importlib.test.benchmark -b sys.modules [ 1297640 1315366 1292283 ] best is 1315366 Built-in module [ 58180 57708 58057 ] best is 58180
And this is what might make evaluating importlib tough; while the performance is 25% of what it is for import.c, being able to import over 300,000 times/second is still damn fast.

On Thu, Jul 15, 2010 at 4:06 PM, Brett Cannon <brett@python.org> wrote:
In any case, here my results under a Linux system:
$ ./python -m importlib.test.benchmark sys.modules [ 323782 326183 326667 ] best is 326667 Built-in module [ 33600 33693 33610 ] best is 33693
$ ./python -m importlib.test.benchmark -b sys.modules [ 1297640 1315366 1292283 ] best is 1315366 Built-in module [ 58180 57708 58057 ] best is 58180
And this is what might make evaluating importlib tough; while the performance is 25% of what it is for import.c, being able to import over 300,000 times/second is still damn fast.
Yeah, I think the numbers where the filesystem gets involved are going to be more relevant. Modules that have already been cached and those built in to the executable aren't likely to dominate interpreter and application startup times (which is the main thing I'm worried about seeing regress). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Jul 15, 2010 at 2:55 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
In any case, here my results under a Linux system:
$ ./python -m importlib.test.benchmark sys.modules [ 323782 326183 326667 ] best is 326667 Built-in module [ 33600 33693 33610 ] best is 33693
$ ./python -m importlib.test.benchmark -b sys.modules [ 1297640 1315366 1292283 ] best is 1315366 Built-in module [ 58180 57708 58057 ] best is 58180
And this is what might make evaluating importlib tough; while the performance is 25% of what it is for import.c, being able to import over 300,000 times/second is still damn fast.
Yeah, I think the numbers where the filesystem gets involved are going to be more relevant. Modules that have already been cached and those built in to the executable aren't likely to dominate interpreter and application startup times (which is the main thing I'm worried about seeing regress).
What about web-applications? Is that true that for FastCgi or mod_wsgi deamon mode interpreter and application is started only once per say 100 requests? -- anatoly t.

I have updated the benchmark to now measure importing source w/o writing bytecode, importing source & writing bytecode, and importing bytecode w/ source (as I don't care about sourceless import performance). Now, before you look at these numbers, realize that I have not once tried to profile importlib to see where its hot spots are (only optimization I have done is cut down on the stat calls since Python 3.1 when I re-developed the ABCs). I'm sure if I profiled the code and wrote key bits in C these performance numbers would improve reasonably quickly. Anyway, on my 2.2 GHz MacBook, this leads to:: import.c sys.modules [ 223337 223036 223362 ] best is 223362 Built-in module [ 23347 23319 23331 ] best is 23347 Bytecode w/ source [ 6624 6607 6608 ] best is 6624 Source w/o bytecode [ 4643 4674 4655 ] best is 4674 Source writing bytecode [ 2063 2145 2204 ] best is 2204 importlib sys.modules [ 43423 43414 43426 ] best is 43426 Built-in module [ 9130 9115 9120 ] best is 9130 Bytecode w/ source [ 1554 1556 1556 ] best is 1556 Source w/o bytecode [ 1351 1351 1353 ] best is 1353 Source writing bytecode [ 786 843 810 ] best is 843 importlib / import.c: sys.modules 19% Built-in module 39% Bytecode w/ source 23% Source w/o bytecode 29% Source writing bytecode 38% What does this show? Stuff that requires a lot of I/O has the smallest performance difference (source writing bytecode), but where there is as little I/O as possible (bytecode w/ source) import.c wins as it has to do less. This is also why sys.modules is so damn fast; it's the smallest amount of C code you can run while importlib has standard Python calling overhead. It should also be pointed out that importlib has fully implemented PEP 302 and intentionally has the loaders using their own exposed PEP 302 APIs. This means there are a lot more methods calls than in the C version, along with a lot less corners cut in the name of performance. So while importlib will be slower simply because it's implemented in C, it will also be slower because the darn thing is actually written to follow the PEPs we have along with making it easier for people to subclass and benefit from the import code. Anyway, as I have said, I need to hit 100% compatibility when running the test suite -- run importlib.test.regrtest to see where it fails now; also read that file as it has notes as to why the known failures are happening -- before I start worrying about bootstrapping and performance and that will all be no sooner than Python 3.3. On Thu, Jul 15, 2010 at 04:55, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Thu, Jul 15, 2010 at 4:06 PM, Brett Cannon <brett@python.org> wrote:
In any case, here my results under a Linux system:
$ ./python -m importlib.test.benchmark sys.modules [ 323782 326183 326667 ] best is 326667 Built-in module [ 33600 33693 33610 ] best is 33693
$ ./python -m importlib.test.benchmark -b sys.modules [ 1297640 1315366 1292283 ] best is 1315366 Built-in module [ 58180 57708 58057 ] best is 58180
And this is what might make evaluating importlib tough; while the performance is 25% of what it is for import.c, being able to import over 300,000 times/second is still damn fast.
Yeah, I think the numbers where the filesystem gets involved are going to be more relevant. Modules that have already been cached and those built in to the executable aren't likely to dominate interpreter and application startup times (which is the main thing I'm worried about seeing regress).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 14 Jul 2010 23:06:58 -0700 Brett Cannon <brett@python.org> wrote:
In any case, here my results under a Linux system:
$ ./python -m importlib.test.benchmark sys.modules [ 323782 326183 326667 ] best is 326667 Built-in module [ 33600 33693 33610 ] best is 33693
$ ./python -m importlib.test.benchmark -b sys.modules [ 1297640 1315366 1292283 ] best is 1315366 Built-in module [ 58180 57708 58057 ] best is 58180
And this is what might make evaluating importlib tough; while the performance is 25% of what it is for import.c, being able to import over 300,000 times/second is still damn fast.
Yes, that's very encouraging. I guess the final test would be to take something like Mercurial, and time e.g. "hg version" both with the builtin-import, and with importlib enabled as default import mechanism. Regards Antoine.
participants (9)
-
Alexander Belopolsky
-
anatoly techtonik
-
Antoine Pitrou
-
Benjamin Peterson
-
Brett Cannon
-
Jack Diederich
-
Michael Foord
-
Nick Coghlan
-
Steve Holden