Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

June 27, 2014

      Hello,

On Thu, 26 Jun 2014 21:52:43 -0400
Ben Hoyt <benhoyt@gmail.com> wrote:

[]
...
It's a fair point that os.walk() can be implemented efficiently
without adding a new function and API. However, often you'll want more
info, like the file size, which scandir() can give you via
DirEntry.lstat(), which is free on Windows. So opening up this
efficient API is beneficial.
In CPython, I think the DirEntry objects are as lightweight as
stat_result objects.
I'm an embedded developer by background, so I know the constraints
here, but I really don't think Python's development should be tailored
to fit MicroPython. If os.scandir() is not very efficient on
MicroPython, so be it -- 99% of all desktop/server users will gain
from it.
Surely, tailoring Python to MicroPython's needs is completely not what
I suggest. It was an example of alternative implementation which
optimized os.walk() without need for any additional public module APIs.
Vice-versa, high-level nature of API call like os.walk() and
underspecification of low-level details (like which function
implemented in terms of which others) allow MicroPython provide
optimized implementation even with its resource constraints. So, power
of high-level interfaces and underspecification should not be
underestimated ;-).

But I don't want to argue that os.scandir() is "not needed", because
that's hardly productive. Something I'd like to prototype in uPy and
ideally lead further up to PEP status is to add iterator-based string
methods, and I pretty much can expect "we lived without it" response,
so don't want to go the same way regarding addition of other
iterator-based APIs - it's clear that more iterator/generator based APIs
is a good direction for Python to evolve.
...
...
It would be better if os.scandir() was specified to return a struct
(named tuple) compatible with return value of os.stat() (with only
fields relevant to underlying readdir()-like system call). The
grounds for that are obvious: it's already existing data interface
in module "os", which is also based on open standard for operating
systems - POSIX, so if one is to expect something about file
attributes, it's what one can reasonably base expectations on.
Yes, we considered this early on (see the python-ideas and python-dev
threads referenced in the PEP), but decided it wasn't a great API to
overload stat_result further, and have most of the attributes None or
not present on Linux.
[]
...
However, for scandir() to be useful, you also need the name. My
original version of this directory iterator returned two-tuples of
(name, stat_result). But most people didn't like the API, and I don't
really either. You could overload stat_result with a .name attribute
in this case, but it still isn't a nice API to have most of the
attributes None, and then you have to test for that, etc.
Yes, returning (name, stat_result) would be my first motion too, I
don't see why someone wouldn't like pair of 2 values, with each value
of obvious type and semantics within "os" module. Regarding stat
result, os.stat() provides full information about a file,
and intuitively, one may expect that os.scandir() would provide subset
of that info, asymptotically reaching volume of what os.stat() may
provide, depending on OS capabilities. So, if truly OS-independent
interface is wanted to salvage more data from a dir scanning, using
os.stat struct as data interface is hard to ignore.

But well, if it was rejected already, what can be said? Perhaps, at
least the PEP could be extended to explicitly mention other approached
which were discussed and rejected, not just link to a discussion
archive (from experience with reading other PEPs, they oftentimes
contained such subsections, so hope this suggestion is not ungrounded).
...
So basically we tweaked the API to do what was best, and ended up with
it returning DirEntry objects with is_file() and similar methods.
Hope that helps give a bit more context. If you haven't read the
relevant python-ideas and python-dev threads, those are interesting
too.
-Ben
-- 
Best regards,
 Paul                          mailto:pmiscml@gmail.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Paul Sokolovsky