[Python-ideas] find-like functionality in pathlib

Thu Jan 7 04:03:01 EST 2016

A bit OT, possibly, but this may be a long way around (to a cached *graph*
of paths and metadata) with similar use cases:

path.py#walk(), NetworkX edge, node dicts

https://github.com/westurner/pyleset/blob/249a0837/structp/structp.py

def walk_path_into_graph(g, path_, errors='warn'):
    """
    """

This stats and reads limited image format metadata as CSV, TSV, JSON:
https://github.com/westurner/image_size/blob/ab46de73/get_image_size.py

I suppose because of race conditions this metadata should actually be
stored in a filesystem triplestore with extended attributes and also
secontext attributes.

(... gnome-tracker reads filesystem stat data into RDF, for SPARQL).

BSP vertex messaging can probably handle cascading cache invalidation (with
supersteps).
On Jan 6, 2016 4:44 PM, "Guido van Rossum" <guido at python.org> wrote:

> I couldn't help myself and coded up a prototype for the StatCache design I
> sketched. See http://bugs.python.org/issue26031. Feedback welcome! On my
> Mac it only seems to offer limited benefits though...
>
> On Wed, Jan 6, 2016 at 8:48 AM, Guido van Rossum <guido at python.org> wrote:
>
>> On Wed, Jan 6, 2016 at 8:11 AM, Random832 <random832 at fastmail.com> wrote:
>>
>>> On Tue, Jan 5, 2016, at 16:04, Guido van Rossum wrote:
>>> > One problem with stat() caching is that Path objects are considered
>>> > immutable, and two Path objects referring to the same path are
>>> completely
>>> > interchangeable. For example, {pathlib.Path('/a'), pathlib.Path('/a')}
>>> is
>>> > a set of length 1: {PosixPath('/a')}. But if we had e.g. Path('/a',
>>> > cache_stat=True), the behavior of two instances of that object might be
>>> > observably different (if they were instantiated at times when the
>>> > contents of the filesystem was different). So maybe stat-caching Path
>>> instances
>>> > should be considered unequal, or perhaps unhashable. Or perhaps they
>>> > should only be considered equal if their stat() values are actually
>>> equal (i.e.
>>> > if the file's stat() info didn't change).
>>>
>>> What about a global cache?
>>
>>
>> It would have to use a weak dict so if the last reference goes away it
>> discards the cached stats for a given path, otherwise you'd have trouble
>> containing the cache size.
>>
>> And caching Path objects should still not be comparable to non-caching
>> Path objects (which we will need to preserve the semantics that repeatedly
>> calling stat() on a Path object created the default way will always redo
>> the syscall). The main advantage would be that caching Path objects could
>> be compared safely.
>>
>> It could still cause unexpected results. E.g. if you have just traversed
>> some big tree using caching, and saved some results (so hanging on to some
>> paths and hence their stat() results), and then you make some changes and
>> traverse it again to look for something else, you might accidentally be
>> seeing stale (i.e. cached) stat() results.
>>
>> Maybe there's a middle ground, where the user can create a StatCache
>> object and pass it into Path creation and traversal operations. Paths with
>> the same StatCache object (or both None) compare equal if their path
>> components are equal. Paths with different StatCache objects never compare
>> equal (but otherwise are ordered by path as usual -- the StatCache object's
>> identity is only used when the paths are equal.
>>
>> Are you (or anyone still reading this) interested in implementing this
>> idea?
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160107/84811a5f/attachment-0001.html>