
So, here's my alternative proposal: add an "ensure_lstat" flag to scandir() itself, and don't have *any* methods on DirEntry, only attributes.
That would make the DirEntry attributes:
is_dir: boolean, always populated is_file: boolean, always populated is_symlink boolean, always populated lstat_result: stat result, may be None on POSIX systems if ensure_lstat is False
(I'm not particularly sold on "lstat_result" as the name, but "lstat" reads as a verb to me, so doesn't sound right as an attribute name)
What this would allow:
- by default, scanning is efficient everywhere, but lstat_result may be None on POSIX systems - if you always need the lstat result, setting "ensure_lstat" will trigger the extra system call implicitly - if you only sometimes need the stat result, you can call os.lstat() explicitly when the DirEntry lstat attribute is None
Most importantly, *regardless of platform*, the cached stat result (if not None) would reflect the state of the entry at the time the directory was scanned, rather than at some arbitrary later point in time when lstat() was first called on the DirEntry object.
There'd still be a slight window of discrepancy (since the filesystem state may change between reading the directory entry and making the lstat() call), but this could be effectively eliminated from the perspective of the Python code by making the result of the lstat() call authoritative for the whole DirEntry object.
Yeah, I quite like this. It does make the caching more explicit and consistent. It's slightly annoying that it's less like pathlib.Path now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't matter. The differences in naming may highlight the difference in caching, so maybe it's a good thing. Two further questions from me: 1) How does error handling work? Now os.stat() will/may be called during iteration, so in __next__. But it hard to catch errors because you don't call __next__ explicitly. Is this a problem? How do other iterators that make system calls or raise errors handle this? 2) There's still the open question in the PEP of whether to include a way to access the full path. This is cheap to build, it has to be built anyway on POSIX systems, and it's quite useful for further operations on the file. I think the best way to handle this is a .fullname or .full_name attribute as suggested elsewhere. Thoughts? -Ben