A bit OT, possibly, but this may be a long way around (to a cached *graph* of paths and metadata) with similar use cases:

path.py#walk(), NetworkX edge, node dicts


def walk_path_into_graph(g, path_, errors='warn'):

This stats and reads limited image format metadata as CSV, TSV, JSON: https://github.com/westurner/image_size/blob/ab46de73/get_image_size.py

I suppose because of race conditions this metadata should actually be stored in a filesystem triplestore with extended attributes and also secontext attributes.

(... gnome-tracker reads filesystem stat data into RDF, for SPARQL).

BSP vertex messaging can probably handle cascading cache invalidation (with supersteps).

On Jan 6, 2016 4:44 PM, "Guido van Rossum" <guido@python.org> wrote:
I couldn't help myself and coded up a prototype for the StatCache design I sketched. See http://bugs.python.org/issue26031. Feedback welcome! On my Mac it only seems to offer limited benefits though...

On Wed, Jan 6, 2016 at 8:48 AM, Guido van Rossum <guido@python.org> wrote:
On Wed, Jan 6, 2016 at 8:11 AM, Random832 <random832@fastmail.com> wrote:
On Tue, Jan 5, 2016, at 16:04, Guido van Rossum wrote:
> One problem with stat() caching is that Path objects are considered
> immutable, and two Path objects referring to the same path are completely
> interchangeable. For example, {pathlib.Path('/a'), pathlib.Path('/a')} is
> a set of length 1: {PosixPath('/a')}. But if we had e.g. Path('/a',
> cache_stat=True), the behavior of two instances of that object might be
> observably different (if they were instantiated at times when the
> contents of the filesystem was different). So maybe stat-caching Path instances
> should be considered unequal, or perhaps unhashable. Or perhaps they
> should only be considered equal if their stat() values are actually equal (i.e.
> if the file's stat() info didn't change).

What about a global cache?
It would have to use a weak dict so if the last reference goes away it discards the cached stats for a given path, otherwise you'd have trouble containing the cache size.

And caching Path objects should still not be comparable to non-caching Path objects (which we will need to preserve the semantics that repeatedly calling stat() on a Path object created the default way will always redo the syscall). The main advantage would be that caching Path objects could be compared safely.

It could still cause unexpected results. E.g. if you have just traversed some big tree using caching, and saved some results (so hanging on to some paths and hence their stat() results), and then you make some changes and traverse it again to look for something else, you might accidentally be seeing stale (i.e. cached) stat() results.

Maybe there's a middle ground, where the user can create a StatCache object and pass it into Path creation and traversal operations. Paths with the same StatCache object (or both None) compare equal if their path components are equal. Paths with different StatCache objects never compare equal (but otherwise are ordered by path as usual -- the StatCache object's identity is only used when the paths are equal.

Are you (or anyone still reading this) interested in implementing this idea?

--Guido van Rossum (python.org/~guido)

--Guido van Rossum (python.org/~guido)

Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/