PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)
Hi, TL,DR: are you ok to add 800 lines of C code for os.scandir(), 4x faster than os.listdir() when the file type is checked? I accepted the PEP 471 (os.scandir) a few months ago, but it is not implement yet in Python 3.5, because I didn't make a choice on the implementation. Ben Hoyt wrote different implementations: - full C: os.scandir() and DirEntry are written in C (no change on os.py) - C+Python: os._scandir() (wrapper for opendir/readdir and FindFirstFileW/FindNextFileW) in C, DirEntry in Python - ctypes: os.scandir() and DirEntry fully implemented in Python I'm not interested by the ctypes implementation. It's useful for a third party project hosted at PyPI, but for CPython I prefer to wrap C functions using C code. In short, the C implementation is faster than the C+Python implementation. The issue #22524 (*) is full of benchmark numbers. IMO the most interesting benchmark is to compare os.listdir() + os.stat() versus os.scandir() + Direntry.is_dir(). Let me try to summarize results of this benchmark: * C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows * C+Python implementation: scandir is not really faster than listdir, between 1.3x and 1.4x faster (*) http://bugs.python.org/issue22524 Ben Hoyt reminded me that os.scandir() (PEP 471) doesn't add any new feature: pathlib already provides a nice API on top of os and os.path modules. (You may even notice that DirEntry a much fewer methods ;-)) The main (only?) purpose of the PEP is performance. If os.scandir() is "only" 1.4x faster, I don't think that it is interesting to use os.scandir() in an application. I guess that all applications/libraries will want to keep compatibility with Python 3.4 and older and so will anyway have to duplicate the code to use os.listdir() + os.stat(). So is it worth to duplicate code for such small speedup? Now I see 3 choices: - take the full C implementation, because it's much faster (at least 3.4x faster!) - reject the whole PEP 471 (not nice), because it adds too much code for a minor speedup (not true on Windows: up to 44x faster!) - take the C+Python implementation, because maintenance matters more than performances (only 1.3x faster, sorry) => IMO the best option is to take the C implementation. What do you think? I'm concerned by the length of the C code: the full C implementations adds ~800 lines of C code to posixmodule.c. This file is already the longest C file in CPython. I don't want to make it longer, but I'm not motived to start to split it. Last time I proposed to split a file (unicodeobject.c), some developers complained that it makes search harder. I don't understand this, there are so many tools to navigate in C code. But it was enough for me to give up on this idea. A alternative is to add a new _scandir.c module to host the new C code, and share some code with posixmodule.c: remove "static" keyword from required C functions (functions to convert Windows attributes to a os.stat_result object). That's a reasonable choice. What do you think? FYI I ran the benchmark on different hardware (SSD, HDD, tmpfs), file systems (ext4, tmpfs, NFS/ext4), operating systems (Linux, Windows). Victor
On 13 February 2015 at 10:07, Victor Stinner <victor.stinner@gmail.com> wrote:
=> IMO the best option is to take the C implementation. What do you think?
FWIW (as I'm not a core dev) I agree. The Windows speedup is huge, and well worth adding the code. I'm assuming that the majority of the C code is cross-platform, so we're not adding a big chunk of code needing *both* Windows and C skills to maintain (any dev with C skills could handle it). Paul
2015-02-13 11:19 GMT+01:00 Paul Moore <p.f.moore@gmail.com>:
On 13 February 2015 at 10:07, Victor Stinner <victor.stinner@gmail.com> wrote:
=> IMO the best option is to take the C implementation. What do you think?
FWIW (as I'm not a core dev) I agree. The Windows speedup is huge, and well worth adding the code. I'm assuming that the majority of the C code is cross-platform, so we're not adding a big chunk of code needing *both* Windows and C skills to maintain (any dev with C skills could handle it).
Paul
The patch can be read here: http://bugs.python.org/file36963/scandir-2.patch Or using Rietveld: http://bugs.python.org/review/22524/#ps13104 The C code is quite simple. It takes 800 lines because C code is more "verbose" than Python code. Manipulate strings, manage memory, take care of the reference counter, etc. just takes more lines. Victor
On 13 February 2015 at 20:33, Victor Stinner <victor.stinner@gmail.com> wrote:
2015-02-13 11:19 GMT+01:00 Paul Moore <p.f.moore@gmail.com>:
On 13 February 2015 at 10:07, Victor Stinner <victor.stinner@gmail.com> wrote:
=> IMO the best option is to take the C implementation. What do you think?
FWIW (as I'm not a core dev) I agree. The Windows speedup is huge, and well worth adding the code. I'm assuming that the majority of the C code is cross-platform, so we're not adding a big chunk of code needing *both* Windows and C skills to maintain (any dev with C skills could handle it).
Paul
The patch can be read here: http://bugs.python.org/file36963/scandir-2.patch
Or using Rietveld: http://bugs.python.org/review/22524/#ps13104
The C code is quite simple. It takes 800 lines because C code is more "verbose" than Python code. Manipulate strings, manage memory, take care of the reference counter, etc. just takes more lines.
This isn't code I'd expect us to have to change very often, so the maintenance risks associated with the pure C implementation seem low. Having it in a separate file rather than making the main implementation file for os even larger does seem like an attractive structural option though. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 13.02.15 12:07, Victor Stinner wrote:
TL,DR: are you ok to add 800 lines of C code for os.scandir(), 4x faster than os.listdir() when the file type is checked?
You can try to make Python implementation faster if 1) Don't set attributes to None in constructor. 2) Implement scandir as: def scandir(path): return map(partial(DirEntry, path), _scandir(path)). 3) Or pass DirEntry to _scandir: def scandir(path): yield from _scandir(path, DirEntry)
2015-02-13 11:52 GMT+01:00 Serhiy Storchaka <storchaka@gmail.com>:
You can try to make Python implementation faster if
1) Don't set attributes to None in constructor.
The class uses __slots__. Setting attributes in the class body is forbidden when __slots__ is used.
3) Or pass DirEntry to _scandir:
def scandir(path): yield from _scandir(path, DirEntry)
I implemented that and there is no major change (1.3x faster => 1.5x, it's still far from 3.5x faster seen with the C implementation). I analyzed numbers (on my desktop PC, HDD, ext4): - readdir: 380 ns - os.stat: 1500 ns - DirEntry(C): 100 ns - DirEntry (Py): 530 ns (5.3x slower) - is_dir(C): 75 ns - is_dir (Py): 260 ns (3.5x slower) listdir+stat benchmarks takes (readdir + stat) nanoseconds scandir+is_dir takes (readdir + DirEntry + is_dir) nanoseconds => scandir+is_dir is faster than list+stat if (DirEntry+is_dir) is faster than (stat). Callig os.stat takes 1500 ns, while readdir() only provides informations required by the benchmark. So if DirEntry + DirEntry.is_dir is faster than 1500 ns, we won :-) The Python implementation takes 790 ns, but the C implementation takes only 175 ns! (4.5x faster) I don't think that any Python performance trick can reduce the Python overhead to make the C+Python implementation interesting compared to os.listdir+os.stat. We are talking about nanoseconds, Python cannot beat C at this resolution. Victor
On 13.02.15 12:07, Victor Stinner wrote:
* C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows
Results on Windows was obtained in the becnhmark that doesn't drop disk caches and runs listdir before scandir.
2015-02-13 12:27 GMT+01:00 Serhiy Storchaka <storchaka@gmail.com>:
On 13.02.15 12:07, Victor Stinner wrote:
* C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows
Results on Windows was obtained in the becnhmark that doesn't drop disk caches and runs listdir before scandir.
The benchmark code is here: http://bugs.python.org/file38120/bench_scandir2.py Eeach test is repeated 5 times, I compared the duration of the fastest call. A test calls the function 5 times in a loop. Anyway, it's not the first call to listdir() which fills the disk cache, but the call to count_entries() (which is implemented with os.scandir). So is there any issue in the benchmark script or not? On Linux, you can use "bench_nocache" (and "bench_nostat_nocache") which flushs the cache: it writes "3" into /proc/sys/vm/drop_caches. scandir is always faster when the disk cache is flushed. Victor
On Fri, 13 Feb 2015 13:27:04 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
On 13.02.15 12:07, Victor Stinner wrote:
* C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows
Results on Windows was obtained in the becnhmark that doesn't drop disk caches and runs listdir before scandir.
Well, that's the point. The objective here is to speed up Python, not the hard drive. Regards Antoine.
On Fri, 13 Feb 2015 11:07:03 +0100 Victor Stinner <victor.stinner@gmail.com> wrote:
* C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows * C+Python implementation: scandir is not really faster than listdir, between 1.3x and 1.4x faster
So amusingly, the bottleneck is not so much the cost of system calls, but the cost of Python wrappers around system calls. Regards Antoine.
* C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows * C+Python implementation: scandir is not really faster than listdir, between 1.3x and 1.4x faster
So amusingly, the bottleneck is not so much the cost of system calls, but the cost of Python wrappers around system calls.
Yes, that's basically right. Or put another way, the cost of the extra system calls is dwarfed by the cost of wrapping things in Python. Victor's given a great summary of the issues at the top of this thread, and I'm definitely for the all-C version -- otherwise we gain a bunch of speed by not calling stat(), but then lose most of it again with the Python wrapping. As Victor noted, the rationale for PEP 471 has always been about performance, and if we don't have much of that (especially on Linux), it's not nearly as worthwhile. Re maintenance of the C code: yes, the pure C version is about twice as many lines as the half Python version (~800 vs ~400), but I think Nick makes a good point here: "This isn't code I'd expect us to have to change very often, so the maintenance risks associated with the pure C implementation seem low." We have to vet this code thoroughly basically once, now. :-) If we go ahead with the all C approach, I'd be in favour of refactoring a little and putting the new scandir code into a separate C file. There are two ways to do this: a) sticking with a single Python module and just referencing the non-static functions in scandir.c from posixmodule.c, or b) sharing some functions but making _scandir.c its own importable module. Option (a) is somewhat simpler as there's not module setup stuff twice, but I don't know if there's a precedent for that way of doing things. -Ben
On Fri, 13 Feb 2015 08:35:00 -0500 Ben Hoyt <benhoyt@gmail.com> wrote:
If we go ahead with the all C approach, I'd be in favour of refactoring a little and putting the new scandir code into a separate C file. There are two ways to do this: a) sticking with a single Python module and just referencing the non-static functions in scandir.c from posixmodule.c, or b) sharing some functions but making _scandir.c its own importable module. Option (a) is somewhat simpler as there's not module setup stuff twice, but I don't know if there's a precedent for that way of doing things.
The _io module already does things that way (the (a) option, I mean). Regards Antoine.
I think posixmodule is a great candidate for splitting up by platform rather than function, as the whole file is packed with ifdef. It's really only lacking a volunteer to do it, but we could start here (ie. make posixmodule_nt.c for the Windows impl, etc.) and progressively move function implementations out over time? All the module setup and probably most of the Python layer can stay where it is. More likely we're going to get bogged down discussing it again though, so if that happens my vote is to just make posixmodule.c 800 lines longer. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Antoine Pitrou<mailto:solipsis@pitrou.net> Sent: 2/13/2015 5:44 To: python-dev@python.org<mailto:python-dev@python.org> Subject: Re: [Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python) On Fri, 13 Feb 2015 08:35:00 -0500 Ben Hoyt <benhoyt@gmail.com> wrote:
If we go ahead with the all C approach, I'd be in favour of refactoring a little and putting the new scandir code into a separate C file. There are two ways to do this: a) sticking with a single Python module and just referencing the non-static functions in scandir.c from posixmodule.c, or b) sharing some functions but making _scandir.c its own importable module. Option (a) is somewhat simpler as there's not module setup stuff twice, but I don't know if there's a precedent for that way of doing things.
The _io module already does things that way (the (a) option, I mean). Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.c...
2015-02-13 15:36 GMT+01:00 Steve Dower <Steve.Dower@microsoft.com>:
I think posixmodule is a great candidate for splitting up by platform rather than function, as the whole file is packed with ifdef. It's really only lacking a volunteer to do it, but we could start here (ie. make posixmodule_nt.c for the Windows impl, etc.) and progressively move function implementations out over time? All the module setup and probably most of the Python layer can stay where it is.
More likely we're going to get bogged down discussing it again though, so if that happens my vote is to just make posixmodule.c 800 lines longer.
Since there are many ways to split this huge file, I agree that it's just fine to add these 800 lines and *then* think how the huge file can be splitted. It's a different topic. Victor
Since there are many ways to split this huge file, I agree that it's just fine to add these 800 lines and *then* think how the huge file can be splitted. It's a different topic.
That's a good idea. Consider adding the new feature (scandir) and the larger issue of refactoring posixmodule as separate issues, separate commits, etc. -Ben
On 02/13/2015 02:07 AM, Victor Stinner wrote:
Hi,
TL,DR: are you ok to add 800 lines of C code for os.scandir(), 4x faster than os.listdir() when the file type is checked?
+1 for the all-C version. -- ~Ethan~
I vote for the C implementation. On Fri, Feb 13, 2015 at 2:07 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
TL,DR: are you ok to add 800 lines of C code for os.scandir(), 4x faster than os.listdir() when the file type is checked?
I accepted the PEP 471 (os.scandir) a few months ago, but it is not implement yet in Python 3.5, because I didn't make a choice on the implementation.
Ben Hoyt wrote different implementations: - full C: os.scandir() and DirEntry are written in C (no change on os.py) - C+Python: os._scandir() (wrapper for opendir/readdir and FindFirstFileW/FindNextFileW) in C, DirEntry in Python - ctypes: os.scandir() and DirEntry fully implemented in Python
I'm not interested by the ctypes implementation. It's useful for a third party project hosted at PyPI, but for CPython I prefer to wrap C functions using C code.
In short, the C implementation is faster than the C+Python implementation.
The issue #22524 (*) is full of benchmark numbers. IMO the most interesting benchmark is to compare os.listdir() + os.stat() versus os.scandir() + Direntry.is_dir(). Let me try to summarize results of this benchmark:
* C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows * C+Python implementation: scandir is not really faster than listdir, between 1.3x and 1.4x faster
(*) http://bugs.python.org/issue22524
Ben Hoyt reminded me that os.scandir() (PEP 471) doesn't add any new feature: pathlib already provides a nice API on top of os and os.path modules. (You may even notice that DirEntry a much fewer methods ;-)) The main (only?) purpose of the PEP is performance.
If os.scandir() is "only" 1.4x faster, I don't think that it is interesting to use os.scandir() in an application. I guess that all applications/libraries will want to keep compatibility with Python 3.4 and older and so will anyway have to duplicate the code to use os.listdir() + os.stat(). So is it worth to duplicate code for such small speedup?
Now I see 3 choices:
- take the full C implementation, because it's much faster (at least 3.4x faster!) - reject the whole PEP 471 (not nice), because it adds too much code for a minor speedup (not true on Windows: up to 44x faster!) - take the C+Python implementation, because maintenance matters more than performances (only 1.3x faster, sorry)
=> IMO the best option is to take the C implementation. What do you think?
I'm concerned by the length of the C code: the full C implementations adds ~800 lines of C code to posixmodule.c. This file is already the longest C file in CPython. I don't want to make it longer, but I'm not motived to start to split it. Last time I proposed to split a file (unicodeobject.c), some developers complained that it makes search harder. I don't understand this, there are so many tools to navigate in C code. But it was enough for me to give up on this idea.
A alternative is to add a new _scandir.c module to host the new C code, and share some code with posixmodule.c: remove "static" keyword from required C functions (functions to convert Windows attributes to a os.stat_result object). That's a reasonable choice. What do you think?
FYI I ran the benchmark on different hardware (SSD, HDD, tmpfs), file systems (ext4, tmpfs, NFS/ext4), operating systems (Linux, Windows).
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On 13 Feb 2015 02:09, "Victor Stinner" <victor.stinner@gmail.com> wrote:
A alternative is to add a new _scandir.c module to host the new C code, and share some code with posixmodule.c: remove "static" keyword from required C functions (functions to convert Windows attributes to a os.stat_result object).
Hopefully not too annoying question from an outsider: has cpython's build system added the necessary bits to do this on a safe, portable, non-symbol-namespace polluting way? E.g. using -fvisibility=hidden on Linux? (I'm partially wondering because until very recently numpy was built by concatenating all the different c files together and compiling that, because that was the only portable way to let different files share access to symbols without also exporting those symbols publicly from the resulting module shared objects. And numpy supports a lot fewer platforms than cpython proper...) -n
On 14 Feb 2015 03:43, "Nathaniel Smith" <njs@pobox.com> wrote:
On 13 Feb 2015 02:09, "Victor Stinner" <victor.stinner@gmail.com> wrote:
A alternative is to add a new _scandir.c module to host the new C code, and share some code with posixmodule.c: remove "static" keyword from required C functions (functions to convert Windows attributes to a os.stat_result object).
Hopefully not too annoying question from an outsider: has cpython's build
system added the necessary bits to do this on a safe, portable, non-symbol-namespace polluting way? E.g. using -fvisibility=hidden on Linux? We just add a "_Py" prefix on the things that we're making available to the linker solely for own use and don't provide any backwards compatibility guarantees for other people that decide to use them directly despite the leading underscore and lack of documentation. Cheers, Nick.
On Fri, Feb 13, 2015 at 5:07 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
Now I see 3 choices:
- take the full C implementation, because it's much faster (at least 3.4x faster!) - reject the whole PEP 471 (not nice), because it adds too much code for a minor speedup (not true on Windows: up to 44x faster!) - take the C+Python implementation, because maintenance matters more than performances (only 1.3x faster, sorry)
=> IMO the best option is to take the C implementation. What do you think?
+1
participants (11)
-
Alexander Belopolsky -
Antoine Pitrou -
Ben Hoyt -
Ethan Furman -
Guido van Rossum -
Nathaniel Smith -
Nick Coghlan -
Paul Moore -
Serhiy Storchaka -
Steve Dower -
Victor Stinner