I couldn't help myself and coded up a prototype for the StatCache design I sketched. See http://bugs.python.org/issue26031. Feedback welcome! On my Mac it only seems to offer limited benefits though...

On Wed, Jan 6, 2016 at 8:48 AM, Guido van Rossum <guido@python.org> wrote:
On Wed, Jan 6, 2016 at 8:11 AM, Random832 <random832@fastmail.com> wrote:
On Tue, Jan 5, 2016, at 16:04, Guido van Rossum wrote:
> One problem with stat() caching is that Path objects are considered
> immutable, and two Path objects referring to the same path are completely
> interchangeable. For example, {pathlib.Path('/a'), pathlib.Path('/a')} is
> a set of length 1: {PosixPath('/a')}. But if we had e.g. Path('/a',
> cache_stat=True), the behavior of two instances of that object might be
> observably different (if they were instantiated at times when the
> contents of the filesystem was different). So maybe stat-caching Path instances
> should be considered unequal, or perhaps unhashable. Or perhaps they
> should only be considered equal if their stat() values are actually equal (i.e.
> if the file's stat() info didn't change).

What about a global cache?
 
It would have to use a weak dict so if the last reference goes away it discards the cached stats for a given path, otherwise you'd have trouble containing the cache size.

And caching Path objects should still not be comparable to non-caching Path objects (which we will need to preserve the semantics that repeatedly calling stat() on a Path object created the default way will always redo the syscall). The main advantage would be that caching Path objects could be compared safely.

It could still cause unexpected results. E.g. if you have just traversed some big tree using caching, and saved some results (so hanging on to some paths and hence their stat() results), and then you make some changes and traverse it again to look for something else, you might accidentally be seeing stale (i.e. cached) stat() results.

Maybe there's a middle ground, where the user can create a StatCache object and pass it into Path creation and traversal operations. Paths with the same StatCache object (or both None) compare equal if their path components are equal. Paths with different StatCache objects never compare equal (but otherwise are ordered by path as usual -- the StatCache object's identity is only used when the paths are equal.

Are you (or anyone still reading this) interested in implementing this idea?

--
--Guido van Rossum (python.org/~guido)



--
--Guido van Rossum (python.org/~guido)