On 29 June 2014 05:48, Ben Hoyt firstname.lastname@example.org wrote:
But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --
What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir?
I guess it'd be better to say "Windows" and "Unix-based OSs" throughout the PEP? Because all of these (including Mac OS X) are Unix-based.
*nix and POSIX-based are the two conventions I use.
Crazy idea: would it be possible to "convert" a DirEntry object to a pathlib.Path object without losing the cache? I guess that pathlib.Path expects a full stat_result object.
The main problem is that pathlib.Path objects explicitly don't cache stat info (and Guido doesn't want them to, for good reason I think). There's a thread on python-dev about this earlier. I'll add it to a "Rejected ideas" section.
The key problem with caches on pathlib.Path objects is that you could end up with two separate path objects that referred to the same filesystem location but returned different answers about the filesystem state because their caches might be stale. DirEntry is different, as the content is generally *assumed* to be stale (referring to when the directory was scanned, rather than the current filesystem state). DirEntry.lstat() on POSIX systems will be an exception to that general rule (referring to the time of first lookup, rather than when the directory was scanned, so the answer rom lstat() may be inconsistent with other data stored directly on the DirEntry object), but one we can probably live with.
More generally, as part of the pathlib PEP review, we figured out that a *per-object* cache of filesystem state would be an inherently bad idea, but a string based *process global* cache might make sense for modules like walkdir (not part of the stdlib - it's an iterator pipeline based approach to file tree scanning I wrote a while back, that currently suffers badly from the performance impact of repeated stat calls at different stages of the pipeline). We realised this was getting into a space where application and library specific concerns are likely to start affecting the caching design, though, so the current status of standard library level stat caching is "it's not clear if there's an available approach that would be sufficiently general purpose to be appropriate for inclusion in the standard library".