On 29 June 2014 21:45, Paul Moore email@example.com wrote:
On 29 June 2014 12:08, Nick Coghlan firstname.lastname@example.org wrote:
This is what makes me wary of including lstat, even though Windows offers it without the extra stat call. Caching behaviour is *really* hard to make intuitive, especially when it *sometimes* returns data that looks fresh (as it on first call on POSIX systems).
If it matters that much we *could* simply call it cached_lstat(). It's ugly, but I really don't like the idea of throwing the information away - after all, the fact that we currently throw data away is why there's even a need for scandir. Let's not make the same mistake again...
Future-proofing is the reason DirEntry is a full fledged class in the first place, though.
Effectively communicating the behavioural difference between DirEntry and pathlib.Path is the main thing that makes me nervous about adhering too closely to the Path API.
To restate the problem and the alternative proposal, these are the DirEntry methods under discussion:
is_dir(): like os.path.isdir(), but requires no system calls on at least POSIX and Windows is_file(): like os.path.isfile(), but requires no system calls on at least POSIX and Windows is_symlink(): like os.path.islink(), but requires no system calls on at least POSIX and Windows lstat(): like os.lstat(), but requires no system calls on Windows
For the almost-certain-to-be-cached items, the suggestion is to make them properties (or just ordinary attributes):
is_dir is_file is_symlink
What do with lstat() is currently less clear, since POSIX directory scanning doesn't provide that level of detail by default.
The PEP also doesn't currently state whether the is_dir(), is_file() and is_symlink() results would be updated if a call to lstat() produced different answers than the original directory scanning process, which further suggests to me that allowing the stat call to be delayed on POSIX systems is a potentially problematic and inherently confusing design. We would have two options:
- update them, meaning calling lstat() may change those results from being a snapshot of the setting at the time the directory was scanned - leave them alone, meaning the DirEntry object and the DirEntry.lstat() result may give different answers
Those both sound ugly to me.
So, here's my alternative proposal: add an "ensure_lstat" flag to scandir() itself, and don't have *any* methods on DirEntry, only attributes.
That would make the DirEntry attributes:
is_dir: boolean, always populated is_file: boolean, always populated is_symlink boolean, always populated lstat_result: stat result, may be None on POSIX systems if ensure_lstat is False
(I'm not particularly sold on "lstat_result" as the name, but "lstat" reads as a verb to me, so doesn't sound right as an attribute name)
What this would allow:
- by default, scanning is efficient everywhere, but lstat_result may be None on POSIX systems - if you always need the lstat result, setting "ensure_lstat" will trigger the extra system call implicitly - if you only sometimes need the stat result, you can call os.lstat() explicitly when the DirEntry lstat attribute is None
Most importantly, *regardless of platform*, the cached stat result (if not None) would reflect the state of the entry at the time the directory was scanned, rather than at some arbitrary later point in time when lstat() was first called on the DirEntry object.
There'd still be a slight window of discrepancy (since the filesystem state may change between reading the directory entry and making the lstat() call), but this could be effectively eliminated from the perspective of the Python code by making the result of the lstat() call authoritative for the whole DirEntry object.
P.S. We'd be generating quite a few of these, so we can use __slots__ to keep the memory overhead to a minimum (that's just a general comment - it's really irrelevant to the methods-or-attributes question).