[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Paul Sokolovsky pmiscml at gmail.com
Fri Jun 27 02:07:46 CEST 2014


Hello,

On Thu, 26 Jun 2014 18:59:45 -0400
Ben Hoyt <benhoyt at gmail.com> wrote:

> Hi Python dev folks,
> 
> I've written a PEP proposing a specific os.scandir() API for a
> directory iterator that returns the stat-like info from the OS, the
> main advantage of which is to speed up os.walk() and similar
> operations between 4-20x, depending on your OS and file system. Full
> details, background info, and context links are in the PEP, which
> Victor Stinner has uploaded at the following URL, and I've also copied
> inline below.

I noticed obvious inefficiency of os.walk() implemented in terms of
os.listdir() when I worked on "os" module for MicroPython. I essentially
did what your PEP suggests - introduced internal generator function
(ilistdir_ex() in
https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85
), in terms of which both os.listdir() and os.walk() are implemented.


With my MicroPython hat on, os.scandir() would make things only worse.
With current interface, one can either have inefficient implementation
(like CPython chose) or efficient implementation (like MicroPython
chose) - all transparently. os.scandir() supposedly opens up efficient
implementation for everyone, but at the price of bloating API and
introducing heavy-weight objects to wrap info. PEP calls it
"lightweight DirEntry objects", but that cannot be true, because all
Python objects are heavy-weight, especially those which have methods.

It would be better if os.scandir() was specified to return a struct
(named tuple) compatible with return value of os.stat() (with only
fields relevant to underlying readdir()-like system call). The grounds
for that are obvious: it's already existing data interface in module
"os", which is also based on open standard for operating systems -
POSIX, so if one is to expect something about file attributes, it's
what one can reasonably base expectations on.


But reusing os.stat struct is glaringly not what's proposed. And
it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(),
but requires no system calls on Windows". Nice, but OS "FooBar" can do
much more than Windows - it has a system call to send a file by email,
right when scanning a directory containing it. So, why not to have
DirEntry.send_by_email(recipient) method? I hear the answer - it's
because CPython strives to support Windows well, while doesn't care
about "FooBar" OS.

And then it again leads to the question I posed several times - where's
line between "CPython" and "Python"? Is it grounded for CPython to add
(or remove) to Python stdlib something which is useful for its users,
but useless or complicating for other Python implementations?
Especially taking into account that there's "win32api" module allowing
Windows users to use all wonders of its API? Especially that os.stat
struct is itself pretty extensible
(https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix
systems (such as FreeBSD), the following attributes may be
available ...", "On Mac OS systems...", - so extra fields can be added
for Windows just the same, if really needed).


> 
> http://legacy.python.org/dev/peps/pep-0471/
> 
> Would love feedback on the PEP, but also of course on the proposal
> itself.
> 
> -Ben
> 

[]

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com


More information about the Python-Dev mailing list