Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

June 30, 2014

      ...
So, here's my alternative proposal: add an "ensure_lstat" flag to
scandir() itself, and don't have *any* methods on DirEntry, only
attributes.
That would make the DirEntry attributes:
is_dir: boolean, always populated
    is_file: boolean, always populated
    is_symlink boolean, always populated
    lstat_result: stat result, may be None on POSIX systems if
ensure_lstat is False
(I'm not particularly sold on "lstat_result" as the name, but "lstat"
reads as a verb to me, so doesn't sound right as an attribute name)
What this would allow:
- by default, scanning is efficient everywhere, but lstat_result may
be None on POSIX systems
- if you always need the lstat result, setting "ensure_lstat" will
trigger the extra system call implicitly
- if you only sometimes need the stat result, you can call os.lstat()
explicitly when the DirEntry lstat attribute is None
Most importantly, *regardless of platform*, the cached stat result (if
not None) would reflect the state of the entry at the time the
directory was scanned, rather than at some arbitrary later point in
time when lstat() was first called on the DirEntry object.
There'd still be a slight window of discrepancy (since the filesystem
state may change between reading the directory entry and making the
lstat() call), but this could be effectively eliminated from the
perspective of the Python code by making the result of the lstat()
call authoritative for the whole DirEntry object.
Yeah, I quite like this. It does make the caching more explicit and
consistent. It's slightly annoying that it's less like pathlib.Path
now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't
matter. The differences in naming may highlight the difference in
caching, so maybe it's a good thing.

Two further questions from me:

1) How does error handling work? Now os.stat() will/may be called
during iteration, so in __next__. But it hard to catch errors because
you don't call __next__ explicitly. Is this a problem? How do other
iterators that make system calls or raise errors handle this?

2) There's still the open question in the PEP of whether to include a
way to access the full path. This is cheap to build, it has to be
built anyway on POSIX systems, and it's quite useful for further
operations on the file. I think the best way to handle this is a
.fullname or .full_name attribute as suggested elsewhere. Thoughts?

-Ben

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Ben Hoyt