Issue #11051: system calls per import
Hi, Antoine Pitrou noticed that Python 3.2 tries a lot of filenames to load a module: http://bugs.python.org/issue11051 Python 3.1 does already test many filenames, but with Python 3.2, it is even worse. For each directory in sys.path, it tries 9 suffixes: '', '.cpython-32m.so', 'module.cpython-32m.so', '.abi3.so', 'module.abi3.so', '.so', 'module.so', '.py', '.pyc'. I don't understand why it tests so much .so suffixes. And why it does test with and without "module". Victor
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 30.01.2011 09:56, schrieb Victor Stinner:
Hi,
Antoine Pitrou noticed that Python 3.2 tries a lot of filenames to load a module: http://bugs.python.org/issue11051
Python 3.1 does already test many filenames, but with Python 3.2, it is even worse.
For each directory in sys.path, it tries 9 suffixes: '', '.cpython-32m.so', 'module.cpython-32m.so', '.abi3.so', 'module.abi3.so', '.so', 'module.so', '.py', '.pyc'.
'' is not really a suffix, but a test for a package directory.
I don't understand why it tests so much .so suffixes.
Because of PEP 3149 and PEP 384.
And why it does test with and without "module".
Because it always did (there's a thing called backwards compatibility.) This is of course probably the obvious one to start a deprecation process. Georg -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1FLnEACgkQN9GcIYhpnLApaACdGDe9qVlZNVHRF92yTqYnYFIp hjIAn34YqvMy8fy7pcz0qAlS/WhRWR4G =1b9C -----END PGP SIGNATURE-----
Python 3.1 does already test many filenames, but with Python 3.2, it is even worse.
For each directory in sys.path, it tries 9 suffixes: '', '.cpython-32m.so', 'module.cpython-32m.so', '.abi3.so', 'module.abi3.so', '.so', 'module.so', '.py', '.pyc'.
I don't understand why it tests so much .so suffixes. And why it does test with and without "module".
The many extensions have been specified in PEP 3149. The PEP also specifies # This "tag" will appear between the module base name and the operation # file system extension for shared libraries. which apparently meant that the existing mechanism is extended to add the tag. The support for both the "short extension" (i.e. ".so") and "long extension" (i.e. "module.so") goes back to r4297 (Python 1.1), when the short extension was added as an alternative to the long extension. The original module suffix was defined in r3518 when dynamic extension modules got supported, as either "module.so" (SUN_SHLIB) or "module.o" (dl_loadmod, apparently Irix). Regards, Martin
On Sun, Jan 30, 2011 at 7:25 PM, Georg Brandl <georg@python.org> wrote:
And why it does test with and without "module".
Because it always did (there's a thing called backwards compatibility.)
This is of course probably the obvious one to start a deprecation process.
But why do we check the long suffix for the *new* extension module naming variants from PEP 3149 and PEP 384? Those are completely new, so there's no backwards compatibility argument there. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Le dimanche 30 janvier 2011 à 22:52 +1000, Nick Coghlan a écrit :
On Sun, Jan 30, 2011 at 7:25 PM, Georg Brandl <georg@python.org> wrote:
And why it does test with and without "module".
Because it always did (there's a thing called backwards compatibility.)
This is of course probably the obvious one to start a deprecation process.
But why do we check the long suffix for the *new* extension module naming variants from PEP 3149 and PEP 384? Those are completely new, so there's no backwards compatibility argument there.
My implicit question was: can we limit the number of tested suffixes? I see two candidates: remove 'module.cpython-32m.so' ('.cpython-32m.so' should be enough) and 'module.abi3.so' ('.abi3.so' should be enough). And the real question is: should we change that before 3.2 final? If we don't change that in 3.2, it will be harder to change it later (but it is still possible). Limit the number of suffixes is maybe not the right solution to limit the number of system calls at startup. We can imagine alternatives: * remember the last filename when loading a module and retry this filename first * specify that a module is a Python system module and should only be loaded from "system directories" * specify the module type (directory, .py file, dynamic library, ...) when loading a module * (or a least remember the module type and retry this type first) * etc. We should find a compromise between speed (limit the number of system calls) and the usability of Python modules. Victor
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 30.01.2011 17:35, schrieb Victor Stinner:
Le dimanche 30 janvier 2011 à 22:52 +1000, Nick Coghlan a écrit :
On Sun, Jan 30, 2011 at 7:25 PM, Georg Brandl <georg@python.org> wrote:
And why it does test with and without "module".
Because it always did (there's a thing called backwards compatibility.)
This is of course probably the obvious one to start a deprecation process.
But why do we check the long suffix for the *new* extension module naming variants from PEP 3149 and PEP 384? Those are completely new, so there's no backwards compatibility argument there.
My implicit question was: can we limit the number of tested suffixes? I see two candidates: remove 'module.cpython-32m.so' ('.cpython-32m.so' should be enough) and 'module.abi3.so' ('.abi3.so' should be enough).
And the real question is: should we change that before 3.2 final?
We most definitely shouldn't. Georg -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1FltgACgkQN9GcIYhpnLDquwCfZH+jtM6nsXz4Iyi2XrhpDKBH +6IAnA4Be/CWQhiQ9hq1VqGH2ent7say =e1d5 -----END PGP SIGNATURE-----
On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner <victor.stinner@haypocalc.com> wrote: ..
We should find a compromise between speed (limit the number of system calls) and the usability of Python modules.
Do you have measurements that show python spending significant time on failing open calls?
Am 30.01.2011 17:54, schrieb Alexander Belopolsky:
On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner <victor.stinner@haypocalc.com> wrote: ..
We should find a compromise between speed (limit the number of system calls) and the usability of Python modules.
Do you have measurements that show python spending significant time on failing open calls?
No; past measurements always showed that this is insignificant, probably thanks to operating system caching the relevant directory blocks (so it doesn't really matter whether you make one or ten lookups per directory; my guess is that it matters more if you look into ten directories instead of one). Regards, Martin
Victor Stinner wrote:
Limit the number of suffixes is maybe not the right solution to limit the number of system calls at startup. We can imagine alternatives:
* remember the last filename when loading a module and retry this filename first * specify that a module is a Python system module and should only be loaded from "system directories" * specify the module type (directory, .py file, dynamic library, ...) when loading a module * (or a least remember the module type and retry this type first) * etc.
Maybe also * Read and cache the directory contents and search it ourselves instead of making a system call for every possible name. -- Greg
Maybe also
* Read and cache the directory contents and search it ourselves instead of making a system call for every possible name.
I wouldn't do that - I would expect that this is actually slower than making the system calls, because the system might get away with not reading the entire directory (whereas it will have to when we explicitly ask for that). Regards, Martin
On Sun, Jan 30, 2011 at 11:33 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Maybe also
* Read and cache the directory contents and search it ourselves instead of making a system call for every possible name.
I wouldn't do that - I would expect that this is actually slower than making the system calls, because the system might get away with not reading the entire directory (whereas it will have to when we explicitly ask for that).
Hm. Long (very long) ago I had to implement just that, and it was much faster. But this was over NFS. Still, I think the directory would have to be truly enormous before reading its contents (which doesn't access all the inodes) is slower than statting a few dozen of its entries. At least on most *nix filesystems. Another thing to consider: on App Engine (which despite of all its architectural weirdness uses a -- mostly -- standard Linux filesystem for the Python code of the app) someone measured that importing from a zipfile is much faster than importing from the filesystem. I would imagine this extends to other contexts too, and it makes sense because the zipfile directory gets cached in memory so no stat() calls are necessary. (Basically I am biased to believe that stat() is a pretty slow system call -- this may just be old NFS lore though.) -- --Guido van Rossum (python.org/~guido)
On 2011-01-30 21:43, "Martin v. Löwis" wrote:
Am 30.01.2011 17:54, schrieb Alexander Belopolsky:
On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner <victor.stinner@haypocalc.com> wrote: ..
We should find a compromise between speed (limit the number of system calls) and the usability of Python modules.
Do you have measurements that show python spending significant time on failing open calls?
No; past measurements always showed that this is insignificant, probably thanks to operating system caching the relevant directory blocks (so it doesn't really matter whether you make one or ten lookups per directory; my guess is that it matters more if you look into ten directories instead of one).
Dear Python-developers, I would like you to be aware of one particular problem related to the system calls in massively parallel systems. We are developing a Python-based simulation software GPAW (https://wiki.fysik.dtu.dk/gpaw/) and tested it with up to tens of thousands of CPU cores. The program uses MPI, thus thousands of Python interpreters are launched at start-up time. As all these interpreters execute the same import statements, the huge amount of (IO-related) system calls puts extreme pressure to the file system, and as result just starting the Python interpreter(s) can take ~45 minutes with ~30 000 CPU cores! Currently, we have tried to work around the problem either by installing Python and required additional modules (NumPy and GPAW) to a ramdisk, or by modifying the CPython source (at the moment 2.6 version) in such a way that only single process performs the system calls and uses MPI to broadcast the results to other processes (preliminary work in progress). As a related problem, dynamic linking can also be quite expensive (or even not available in some systems), and in some cases we have made a small hack to CPython for enabling statically linked packages (simple modules can of course be included relatively easily in static Python build.) I am not expecting that the problems can be solved easily for the general CPython interpreter, especially as massively parallel supercomputers are quite small niche of Python usage. However, I think it would be good to be aware of problems with large amount of system calls in a more special Python usage. Best regards, Jussi -- Jussi Enkovaara, Application Scientist, High Performance Computing, CSC PO. BOX 405 02101 Espoo, Finland, Tel +358 9 457 2935, fax +358 9 457 2302 CSC - IT Center for Science, www.csc.fi, e-mail: jussi.enkovaara@csc.fi
On Mon, 31 Jan 2011 00:08:25 -0800 Guido van Rossum <guido@python.org> wrote:
(Basically I am biased to believe that stat() is a pretty slow system call -- this may just be old NFS lore though.)
I don't know about NFS, but starting a Python interpreter located on a Samba share from a Windows VM is quite slow too. I think Martin is right for the common case: on a local filesystem on a modern Unix, stat() is certainly very fast. Remote or distributed filesystems seem to be more of a problem. Regards Antoine.
On Jan 30, 2011, at 05:35 PM, Victor Stinner wrote:
And the real question is: should we change that before 3.2 final? If we don't change that in 3.2, it will be harder to change it later (but it is still possible).
I don't see how you possibly can without re-entering beta. Mucking with the import machinery *at all* does not seem prudent in the last RC. ;) FWIW, I recall this being discussed at the time of the PEPs and we decided not to narrow the search patterns down. I'd have to go through my archives for the details, but I think it would be better to officially deprecate the 'module' form so that they can be removed in a future version. -Barry
On Mon, Jan 31, 2011 at 04:43, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 31 Jan 2011 00:08:25 -0800 Guido van Rossum <guido@python.org> wrote:
(Basically I am biased to believe that stat() is a pretty slow system call -- this may just be old NFS lore though.)
I don't know about NFS, but starting a Python interpreter located on a Samba share from a Windows VM is quite slow too. I think Martin is right for the common case: on a local filesystem on a modern Unix, stat() is certainly very fast. Remote or distributed filesystems seem to be more of a problem.
I should mention that I have considered implementing a caching finder and loader for filesystems in importlib for people to optionally install to use for themselves. The real trick, though, is should it only cache hits, misses, or both? Regardless, though, it would be a very simple mixin or subclass to implement if there is demand for this sort of thing. And as for the zipfile being faster, that's true (I have incomplete benchmarks in importlib that you can use if people want to measure this stuff themselves, although you will need to tweak them to run against a zipfile).
Another thing to consider: on App Engine (which despite of all its architectural weirdness uses a -- mostly -- standard Linux filesystem for the Python code of the app) someone measured that importing from a zipfile is much faster than importing from the filesystem. I would imagine this extends to other contexts too, and it makes sense because the zipfile directory gets cached in memory so no stat() calls are necessary.
Of course, you can't know until you measure, and then you only know about the specific case. However, I think you can't really compare zip reading with directory reading - I'd expect that reading a zip directory is signficantly faster than reading the directory contents of the zip file unpacked, just because this is so many fewer layers of indirection. Regards, Martin
On 1/31/2011 1:38 PM, Brett Cannon wrote:
I should mention that I have considered implementing a caching finder and loader for filesystems in importlib for people to optionally install to use for themselves. The real trick, though, is should it only cache hits, misses, or both? Regardless, though, it would be a very simple mixin or subclass to implement if there is demand for this sort of thing.
I have in the past implemented a PEP302 finder/loader zipfile-based cache. On campus, I use a version of python installed to my home directory that is on an NFS share. I found such a cache often gave slower startup times for applications like bzr and hg. My cache merely stores things it finds things in sys.path and loads from the zipfile names that it knows and storing those that it doesn't. I make no attempt to invalidate the cache contents once stored. So, I am already talking about a best-case scenario for caching. I'm not sure how you could invalidate the cache without paying the cost of all the normal syscalls that we are trying to avoid. My finder/loader is not bug-free, but I'd be glad to make it available to someone if they want to play around with it. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
Hello Speaking from experience from my observations on millions of machines the stat() call is *very slow* when compared to readdir(), FindNextFile(), getdirentriesattr(), etc. When we switched from a file system indexer that stat()ed every file to one that read directories we noticed an average speedup of about 10x. You can probably attribute this to the fact that in file system indexing the raw system call volume is much lower (not having to stat() each file, just read the directories) but also due to the fact that there is much less HD seeking (stat() has to jump around the HD, usually all directory entries fit in one block). If you only need to test for the existence of multiple files and don't need the extra information that stat() gives you, it might make sense to avoid the context switch/IO overhead. Rian On Jan 31, 2011, at 4:43 AM, Antoine Pitrou wrote:
On Mon, 31 Jan 2011 00:08:25 -0800 Guido van Rossum <guido@python.org> wrote:
(Basically I am biased to believe that stat() is a pretty slow system call -- this may just be old NFS lore though.)
I don't know about NFS, but starting a Python interpreter located on a Samba share from a Windows VM is quite slow too. I think Martin is right for the common case: on a local filesystem on a modern Unix, stat() is certainly very fast. Remote or distributed filesystems seem to be more of a problem.
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/rian%40dropbox.com
participants (13)
-
"Martin v. Löwis"
-
Alexander Belopolsky
-
Antoine Pitrou
-
Barry Warsaw
-
Brett Cannon
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Jussi Enkovaara
-
Nick Coghlan
-
Rian Hunter
-
Scott Dial
-
Victor Stinner