
On Mon, 25 Nov 2013 12:04:28 +1300 Ben Hoyt <benhoyt@gmail.com> wrote:
Right now, pathlib doesn't cache. Guido decided it was safer to start off like that, and perhaps later we can add some optional caching.
One reason caching didn't go in is that it's not clear which API is best. Working on pluggin scandir() into pathlib would actually help choosing a stat-caching API.
(or, rather, lstat-caching...)
The other related thing is that DirEntry only provides .lstat(), because it's providing stat-like info without following links.
Path.is_dir() and friends use stat(), i.e. they inform you about whether a symlink's target is a directory (not the symlink itself). Of course, if the DirEntry says the path is a symlink, Path.is_dir() could then run stat() to find out about the target.
Do you plan to propose scandir() for inclusion in the stdlib?
Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry objects" for inclusion into the stdlib, and also speed up os.walk() as a result.
However, pathlib's API with .is_dir() and .lstat() etc are so close to DirEntry, I'd be much keener to roll up the scandir functionality into pathlib's iterdir(), as that's already going in the standard library, and iterdir() already returns Path objects.
We could still expose scandir() as a low-level API, *and* call it in pathlib for optimizations.
We could do Path.lstat(cached=True), but we'd also really want is_dir(cached=True), so that API kinda sucks. Alternatively you could have iterdir(cached=True) return PathWithCachedStat style objects -- probably better, but kinda messy.
Perhaps Path.enable_caching()? It would enable caching not only on this path objects, but all objects constructed from it. Regards Antoine.