[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Ben Hoyt benhoyt at gmail.com
Fri Jun 27 03:52:43 CEST 2014


> os.listdir() when I worked on "os" module for MicroPython. I essentially
> did what your PEP suggests - introduced internal generator function
> (ilistdir_ex() in
> https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85
> ), in terms of which both os.listdir() and os.walk() are implemented.

Nice (though I see the implementation is very *nix specific).

> With my MicroPython hat on, os.scandir() would make things only worse.
> With current interface, one can either have inefficient implementation
> (like CPython chose) or efficient implementation (like MicroPython
> chose) - all transparently. os.scandir() supposedly opens up efficient
> implementation for everyone, but at the price of bloating API and
> introducing heavy-weight objects to wrap info. PEP calls it
> "lightweight DirEntry objects", but that cannot be true, because all
> Python objects are heavy-weight, especially those which have methods.

It's a fair point that os.walk() can be implemented efficiently
without adding a new function and API. However, often you'll want more
info, like the file size, which scandir() can give you via
DirEntry.lstat(), which is free on Windows. So opening up this
efficient API is beneficial.

In CPython, I think the DirEntry objects are as lightweight as
stat_result objects.

I'm an embedded developer by background, so I know the constraints
here, but I really don't think Python's development should be tailored
to fit MicroPython. If os.scandir() is not very efficient on
MicroPython, so be it -- 99% of all desktop/server users will gain
from it.

> It would be better if os.scandir() was specified to return a struct
> (named tuple) compatible with return value of os.stat() (with only
> fields relevant to underlying readdir()-like system call). The grounds
> for that are obvious: it's already existing data interface in module
> "os", which is also based on open standard for operating systems -
> POSIX, so if one is to expect something about file attributes, it's
> what one can reasonably base expectations on.

Yes, we considered this early on (see the python-ideas and python-dev
threads referenced in the PEP), but decided it wasn't a great API to
overload stat_result further, and have most of the attributes None or
not present on Linux.

> Especially that os.stat struct is itself pretty extensible
> (https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix
> systems (such as FreeBSD), the following attributes may be
> available ...", "On Mac OS systems...", - so extra fields can be added
> for Windows just the same, if really needed).

Yes. Incidentally, I just submitted an (accepted) patch for Python 3.5
that adds the full Win32 file attribute data to stat_result objects on
Windows (see https://docs.python.org/3.5/whatsnew/3.5.html#os).

However, for scandir() to be useful, you also need the name. My
original version of this directory iterator returned two-tuples of
(name, stat_result). But most people didn't like the API, and I don't
really either. You could overload stat_result with a .name attribute
in this case, but it still isn't a nice API to have most of the
attributes None, and then you have to test for that, etc.

So basically we tweaked the API to do what was best, and ended up with
it returning DirEntry objects with is_file() and similar methods.

Hope that helps give a bit more context. If you haven't read the
relevant python-ideas and python-dev threads, those are interesting
too.

-Ben


More information about the Python-Dev mailing list